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Abstract 

Least  squares  projections  are  a  useful  way  of  describing  the  relationship 
between  random  variables.   These  include  conditional  expectations  and 
projections  on  additive  functions.   Sample  least  squares  provides  a  convenient 
way  of  estimating  such  projections.   This  paper  gives  convergence  rates  and 
asymptotic  normality  results  of  least  squares  estimators  of  linear  functionals 
of  projections.   General  results  are  derived,  and  primitive  regularity 
conditions  given  for  power  series  and  splines.   Also,  it  is  shown  that 
mean-square  continuity  of  a  linear  functional  is  necessary  for  v^-consistency 
and  sufficient  under  conditions  for  asymptotic  normality,  and  this  result  is 
applied  to  estimating  the  parameters  of  a  finite  dimensional  component  of  a 
projection  and  to  weighted  average  derivatives  of  projections. 


Keywords:   Nonparametric  regression,  additive  interactive  models,  partially 
linear  models,  average  derivatives,  polynomials,  splines,  convergence  rates, 
asymptotic  normality. 


1.    Introduction 

Least  squares  projections  of  a  random  variable  y  on  functions  of  a 
random  vector  q  provide  a  useful  way  of  describing  the  relationship  between 
y  and  q.   The  most  familiar  example  is  the  conditional  expectation  E[y|q], 
which  is  the  projection  on  the  linear  space  of  all  (measurable,  finite 
mean-square)  functions  of  q.   Estimation  of  this  projection  is  the 
nonparametric  regression  problem.   Motivated  partly  by  the  difficulty  of 
estimating  E[y|q]   when  q  has  high  dimension,  projections  on  smaller  sets  of 
functions  have  been  considered,  by  Breiman  and  Stone  (1978),  Breiman  and 
Friedman  (1985),  Friedman  and  Stuetzle  (1981),  Stone  (1985),  Zeldin  and  Thomas 
(1977).   These  include  projections  on  the  set  of  functions  that  are  additive 
in  linear  combinations  of  q,   and  generalizations  to  allow  the  component 
functions  to  be  multi-dimensional. 

One  simple  way  to  estimate  nonparametric  projections  is  by  regression  on 
a  finite  dimensional  subset,  with  dimension  allowed  to  grow  with  the  sample 
size,  e.g.  as  in  Agarwal  and  Studden  (1980),  Gallant  (1981),  Stone  (1985),  Cox 
(1988),  and  Andrews  (1991),  which  will  be  referred  to  here  as  series 
estimation.   This  type  of  estimator  may  not  be  good  at  recovering  the  "fine 
structure"  of  the  projection  relative  to  other  smoothers,  e.g.  see  Buja, 
Hastie,  and  Tibshirani  (1989),  but  is  computationally  simple.   Also,  the  fine 
structure  is  less  important  for  mean-square  continuous  functionals  of  the 
projection,  such  as  the  parameters  of  partially  linear  models  or  weighted 
average  derivative  (examples  discussed  below),  which  are  essentially  averages. 

This  paper  derives  convergence  rates  and  asymptotic  normality  of  series 
estimators  of  projection  functionals.   Convergence  rates  are  important  because 
they  show  how  dimension  affects  the  asymptotic  accuracy  of  the  estimators 
(e.g.  Stone  1982,  1985)  and  are  useful  for  the  theory  of  semiparametric 


estimators  that  depend  on  projection  estimates  (e.g.  Newey  1991).   Asymptotic 
normality  is  useful  for  statistical  inference  about  functionals  of  projection, 
such  as  derivatives.   The  paper  gives  mean-square  rates  for  estimation  of  the 
projection  and  uniform  conververgence  rates  for  estimation  of  functions  and 
derivatives.   Conditions  for  asymptotic  normality  and  consistent  estimation  of 
asymptotic  standard  errors  are  given,  and  applied  to  estimation  of  a  component 
of  an  additive  projection  and  its  derivatives.   Fully  primitive  regularity 
conditions  are  given  for  power  series  and  spline  regression,  as  well  as  more 
general  conditions  that  may  apply  to  other  types  of  series.   The  regularity 
conditions  allow  for  dependent  observations,  so  that  they  are  of  use  for  time 
series  models. 

The  paper  also  relates  continuity  properties  of  linear  functionals  of 
projections  to  ^-consistent  estimation.   Under  a  regularity  condition  on  the 
projection  residual  variance,  continuity  in  mean-square  is  shown  to  be 
necessary  for  existence  of  a  (regular)  V^-consistent  estimator,  and  sufficient 
for  asymptotically  normal  series  estimators.   This  result  is  used  to  derive 
V^-consistency  results  for  partially  linear  models  with  an  additive 
nonparametric  component  (a  generalization  of  the  model  of  Engle  et.  al. ,  1984) 
with  and  for  weighted  average  derivatives  of  (possibly)  additive  models  (a 
generalization  of  the  Stoker  (1986)  functional). 

One  problem  that  motivates  the  results  given  here  is  estimation  of  an 
additive  nonparametric  autoregression, 

(1.1)     y^  =  ^(yt-l^  *  •••  -^  ^^^t-r^  ^  ^' 

where  e   is  the  residual  from  the  projection  of  y   on  additive  functions 
of  r  lags.   This  model  avoids  high  dimensional  arguments  but  allows  for 
several  lags,  which  seems  useful  for  short  time  series.   The  convergence  rates 


and  asymptotic  normality  results  apply  to  estimates  of  this  projection, 
although  asymptotic  normality  here  requires  it  to  be  a  dynamic  regression, 
where  E[e  |y   , y  „• • • • 1  ~  ^-      Effects  of  lagged  values  can  be  quantified  by 
the  weighted  average  derivative  Xw(y   ) [5h(y   )/ay   ]dy   ,   (J  =  1,  .... 
r).   The  results  to  follow  include  primitive  regularity  conditions  for 
Vn-consistency  and  asymptotic  normality  of  series  estimators  of  this 
functional,  and  could  also  be  applied  to  generalizations  that  allow 
interactions  between  lags  in  equation  (1.1). 

These  results  are  an  addition  to  previous  work  on  the  subject,  including 
that  of  Agarwal  and  Studden  (1980),  Gallant  (1981),  Stone  (1985).  Cox  (1988), 
Andrews  (1991),  Andrews  and  Whang  (1990),  because  the  results  (including 
asymptotic  normality)  do  not  require  that  the  projection  equal  the  conditional 
expectation,  similarly  to  Stone  (1985),  but  not  to  the  others.   In  addition, 
the  results  allow  for  dependent  observations,  and  apparently  improve  in  some 
respects  on  those  of  Cox  (1988)  and  Andrews  (1991)  for  the  special  case  of 
conditional  expectations.   There  is  some  overlap  of  the  convergence  rates  with 
a  recent  paper  by  Stone  (1990)  on  additive-interactive  spline  estimators, 
which  the  author  saw  only  after  the  first  version  of  this  paper  was  written. 
Stone's  rate  results  (1990)  are  implied  by  those  of  Section  7  below,  under 
conditions  that  are  weaker  in  some  respects  (allow  for  dependence)  and 
stronger  in  others  (imposing  a  side  condition  on  allowed  number  of  terms). 
Also,  the  same  convergence  rate  result  is  given  in  Section  6  for  variable 
degree  polynomial  regression,  which  is  not  considered  in  Stone  (1990),  and 
uniform  rates  and  asymptotic  normality  are  shown  here. 


2.    Series  Estimators 

The  results  of  this  paper  concern  estimators  of  least  squares  projections 
that  can  be  described  as  follows.   Let  z  denote  a  data  observation,   y  and 
q   (measurable)  functions  of  z,   with  q  having  dimension  r.   Let  H 
denote  a  mean-squared  closed,  linear  subspace  of  the  set  of  all  functions  of 
q  with  finite  mean-square.   The  projection  of  y  on  K  is 

(2.1)  h(q)  =  argminj^^j^E[{y  -  h(q)}^]. 

An  example  is  the  conditional  expectation,   h(q)  =  E[y|q],   where  K   is  the 
set  of  all  measurable  functions  of  q  with  finite  mean-square.   An  important 
generalization  has  q  =  (ql.ql)',   q.   has  finite  mean-square,   q  ., 
(1=1 L)   are  subvectors  of  q  ,   and 

(2.2)  n  =   {q^p  .  h^,^2f^2t^    :    ~^^^^    ^[^^^(q^,)']  <  <^}. 

Primitive  conditions  for  this  set  to  be  closed  are  given  in  Section  6.   This 
is  a  smaller  set  of  functions,  whose  consideration  is  motivated  partly  by  the 
difficulty  of  estimating  conditional  expectations  for  x  with  many 
dimensions;  e.g.  see  Stone  (1985)  for  discussion  and  references.   The  general 
theory  allows  for  any  closed  H,      (e.g.  H  =   {w(q)  [J]„h„(q.)  ]  >,   w(q)  a 
known  function,  under  conditions  for  this  to  be  closed),  and  primitive 
conditions  are  given  for  power  series  and  spline  estimators  of  a  projection 
on  the  H     of  equation  (2.2). 

The  estimators  of  h(q)   considered  here  are  sample  projections  on  a 
finite  dimensional  subspace  of  H,      which  can  be  described  as  follows.   Let 
p  (q)  =  (p.„(q), . . . ,p„„(q) )'   be  a  vector  of  functions,  each  of  which  is  an 
element  of  H,      Denote  the  data  observations  by  y.   and  q., 


(i  =  1.  2.  ...).   and  let  y  =  (y^. . • . . y^) '   and  p^  =  [p^(q^ )...., p^(q^) ] , 
for  sample  size  n.   An  estimator  of  h{q)   is 

(2.3)      h(q)  =  p^(q)'n,   ^  =  (p^'p^)"p^'y. 


where  (•)  denotes  a  generalized  inverse,  and  K  subscripts  for  h(q)  and 
71  have  been  suppressed  for  notational  convenience.  The  matrix  p  'p  will 
be  asymptotically  nonsingular  under  conditions  given  below,  making  the  choice 
of  generalized  inverse  asymptotically  irrelevant. 

Often  the  object  of  interest  is  not  the  projection,  but  rather  some 
functional  A(h)   of  h(q),   where  A(h)   is  an  s  x  1   vector  of  real 
numbers.   Examples  are  A(h)  =  3  h(q),   a  partial  derivative  evaluated  at 

some  point  q,   and   A(h)  =  h  (q.)-h  (q  )  =  h{q q  ,...,q  )  -  h(q)   for 

H  =   {J]  ._.h  .  (q  .) },   the  difference  of  the  Jth  component  of  an  additive 
projection,  at  two  different  points. 

Another  example  is  the  parameters  of  the  finite  dimensional  component  of 
a  projection,  i.e.   p   in   h(q)  =  q'P  +  S._  h  „(q  .).   Estimators  of  such 
parameters  have  been  analyzed  by  Chamberlain  (1986),  Heckman  (1986),  Rice 
(1986),  Robinson  (1988),  Schick  (1986),  and  others,  but  only  under  the 
conditions  h(q)  =  E[y|q]   and  L  =  1.   Here  the  nonparametric  component  can 
be  additive,  which  leads  to  a  more  efficient  estimator  if  h(q)  =  E[y|q]   and 
Var(ylq)   is  constant:  see  Section  5.   Let  K  =  {Z._  h  .(q  .)>,   assume  for 
the  moment  that  K        is  closed,  let  P(q  \H   )      denote  the  vector  of 
projections  of  each  element  of  q   on  H   ,      and  M  = 

E[{q  -P(q  1«  )}{q  -P(q  |K  )>' ].   Assuming  that  M   is  nonsingular,  the 
identification  condition  for  P,   it  follows  that 

(2.4)     P  =  A(h),   A(h)  =  M"^E[{q^-P(q^|K2)}h(q)]. 


In  addition,  once  /3   is  in  hand,  one  can  consider  linear  functlonals  of 
h2(q2).   by  subtracting  off  q^p  for  any  q^,   e.g.  for  L  =  1.  ^i=i^2l^^Zp 
h(q  ,q  )  -  q'M  E[{q,-P(q. |K  )}h(q)],  where  q   is  some  specified  value  of 

Another  interesting  example  is  a  weighted  integral  of  a  partial 
derivative  of  h(q),   of  the  form 

X 

(2.5)     A.(h)  =  /w  (q)a  %(q)dq,   (J  =  1 s), 

for  a  multi-indices  A.  and  functions  w.(q).   This  is  an  average  derivative 
functional  similar  to  that  of  Stoker  (1986),  including  the  nonparametric 
autoregression  from  Section  1.   Estimators  of  similar  functionals  have  been 
analyzed  by  Hardle  and  Stoker  (1989),  Powell,  Stock,  and  Stoker  (1989),  and 
Andrews  (1991). 

In  this  paper  most  of  the  analysis  will  concern  linear  functionals  of  h, 
such  as  the  above  examples.   The  natural  "plug-in"  estimators  of  linear 

functionals  have  a  simple  form.   Let  A  =  (A(p  ^(q)) A(p^,^,(q) ) )' .   Because 

h(q)   is  a  linear  combination  of  elements  of  H,      linearity  of  A(h)   implies 

A(h)  =  A'tt. 

The  paper  focuses  on  linear  functionals  because  the  linearity  in  n     of  this 
estimator  leads  to  straightforward  asymptotic  distribution  theory  for  A(h). 
Of  course,  the  delta-method  can  also  be  used  to  analyze  nonlinear  functions  of 
such  estimators. 

The  idea  of  sample  projection  estimators  is  that  they  should  approximate 
h(q)   if  K  is  allowed  to  grow  with  the  sample  size.   The  two  key  features  of 
this  approximation  are  that  1)  each  component  of  p  (q)   is  an  element  of  K, 
and  2)  p  (q)   "spans"   H  as  K  grows  (i.e.  for  any  function  in  H,   K  can 


be  chosen  big  enough  that  there  is  a  linear  combination  of  p  (q)   that 
approximates  it  arbitrarily  closely  in  mean  square).   Under  1),  n     estimates 
71=  (E[p*^(q.)p*^(q.)'])"^E[p^(q.)y^]  =  (E[p^(q.  )p^(q^  ) '  ]  )"^E[p*^(q.  jh^lq.  )  ] . 
the  coefficients  of  the  projection  of  h  (q)   on  p  (q).   Thus,  under  1)  and 
2),   p  (q)'7r  will  approximate  h  (q).   Consequently,  when  the  estimation 
error  in  tt   is  small,   h(q)   should  approximate  h  (q). 

Two  types  of  approximating  functions  will  be  considered  in  detail.   They 
are: 


Power  Series:      Let  A  =  (X  , ...,A  )'   denote  an  r-dimensional  vector  of 

nonnegative  integers,  i.e.  a  multi-index,  with  norm   |A|  =  Xii-i-^i'  ^^'^   l^t 

X  r         I  oo 

q  =  Tl„     q.     .      For  a  sequence   (A(k))     of  distinct  such  vectors,-  a  power 

series  approximation  corresponds  to 


(2.6)     pj^(q)  =  i 


^Ik-  ^  =   ^' 


.    s 


A(k-s)  ,    ^. 
q_      ,  k  =  s+1. 


allowing  for  the  finite  dimensional  component  of  H     discussed  above. 
Throughout  the  paper  it  will  be  assumed  that  A(k-s)   are  ordered  in  the 
natural  way,  with  the  degree  of  the  terms   IA(k-s)|   monotonically  increasing 

in  k.   Also,  an  additive  q   component  of  the  projection  will  be  allowed  by 

X (k-s ) 
including  in   {q      >   only  those  terms  with  components  that  are  subvectors 

of  some  q„»,   i.e.  by  requiring  that  for  each  multi-index  X(k-s)   there 

exists  a  q  .  such  that  the  only  nonzero  components  of  A (k-s)   are  those 

where  the  corresponding  component  of  q  is  included  in  q„.   The  spanning 

condition  will  be  that  all  such  terms  appear  in   {A(k-s)}.   All  of  these 

A(k-s). 


requirements  are  summarized  in  the  statement  that   {q. 


}   consists  of  all 


multivariate  powers  of  each  q  „,   ordered  so  that   |A(k-s)|   is  monotonic 
increasing. 


The  theory  to  follow  uses  orthogonal  polynomials,  which  may  also  have 


computational  advantages.   If  each  q 


A(k-s) 


is  replaced  with  the  product  of 


orthogonal  polynomials  of  order  corresponding  to  components  of  \(k-s),   with 
respect  to  some  weight  function  on  the  range  of  q  ,   and  the  distribution  of 

q   is  similar  to  this  weight,  then  there  should  be  little  collinearity  among 

X(k-s) 
the  different  q      .   The  estimator  will  be  numerically  invariant  to  such  a 

replacement  (because   |A(k-s)|   is  monotonically  increasing),  but  it  may 

alleviate  the  well  known  multicollinearity  problem  for  power  series. 


Splines:      Splines,  which  are  smooth  piecewise  polynomials,  can  be  formulated 
as  projections  if  their  knots  (the  joining  points  for  the  polynomials)  are 
held  fixed.   They  have  attractive  features  relative  to  power  series,  being 
less  oscillatory  and  less  sensitive  to  bad  approximation  over  small  regions. 
The  theory  here  requires  that  the  knots  be  placed  in  the  support  of  q^., 

which  therefore  must  be  known.   For  convenience  the  support  is  normalized  to 

r~s 
be  ^  =  n-=i  f"'^' ^-l  •  ^^^   attention  restricted  to  evenly  spaced  knots.   Let 

(•)   =  1(*  >  0)(*).   An  m    degree  spline  with  L+1  evenly  spaced  knots  on 


[-1,1]   is  a  linear  combination  of 

r       u^,   0  £  fc  £  m, 
'^^       [  {[u  +  1  -  2(A:-m)/(L+l)]^>  ,   m+1  i  /fc  £  m+L 

For  a  set  of  multi-indices  {A(k)},   with  A.(k)  ^  m+L-1  for  each  J  and  k, 
the  approximating  functions  for  q   will  be  products  of  univariate  splines, 
i.  e. 


(2.7)     pj^j,(q)  =  { 


q^jj.  k  =  1,  .  .  .  ,  s 


r-s 


nj=i^^(k-s).L_|^<'2j^'  ^  =  ^"i-  •••^ 


Note  that  implicit  in  K  is  a  choice  of  number  of  knots  for  each  of  the 


components  of  q   and  a  choice  of  which  multiplicative  components  to  include. 
Throughout  the  paper  it  will  be  assumed  that  the  the  ratio  of  numbers  of  knots 
for  each  pair  of  elements  of  q   is  bounded  above  and  below.   An  additive  q 
component  of  the  projection  will  be  allowed  for  by  requiring  that  the 
multi-indices  satisfy  the  same  condition  as  for  power  series,  which  can  be 
siommarlzed  in  the  statement  that  the  q   terms  consist  of  all  interactions  of 
components  that  appear  in  any  q„„. 

The  condition  that  the  support  of  q   is  $■      is  not  as  restrictive  as  it 

may  first  appear.   Suppose  that  there  are  "original"  variables  x   with 

r — s 
support  R   ,   and  t(«)  is  a  univariate  one-to-one  transformation  with 

domain  R  and  range   [-1,1],   then  q  =  (t(x   ),..., t(x     ))'   will  have 

support  }.      Since  additive  projections  are  invariant  to  such  componentwise, 

one-to-one  transformations,  the  spline  estimator  based  on  q   will  also 

estimate  the  original  projection.   Of  course,  the  condition  that  the  support 

r — s 
of  X   is  R     is  restrictive.   Also,  the  bounds  on  derivatives  of  the 

projection  imposed  to  obtain  convergence  rates  in  what  follows  are 

restrictive.   The  transformation  must  be  continuously  differentiable  with 

positive  derivative  to  preserve  differentiability  of  the  projection,  and 

boundedness  of  the  derivatives  will  require  that  the  derivatives  of  the 

original  projection  go  to  zero  as  x   grows  faster  than  the  derivatives  of 

the  transformation.   For  example,  if  t(')  =  2F(«)  -  1,   where  F(x)   is  a  CDF 

that  is  continuously  differentiable  of  all  orders,  then  the  order  of 

differentiability  of  the  projection  is  preserved  under  the  transformation,  but 

boundedness  of  derivatives  requires  that  the  derivatives  of  the  projection 

go  to  zero  faster  than  the  density  of  F  as  x   goes  to  infinity. 

Fixed,  evenly  spaced  knots  is  restrictive,  and  is  motivated  by 

theoretical  convenience.   A  judicious  choice  of  transformation  may  help 

alleviate  effects  of  evenly  spaced  knots.   If  a  distribution  function  is  used 


to  transform  the  data,  as  discussed  above,  and  the  distribution  matches 
closely  the  true  distribution,  then  the  transformed  variable  will  be  "spread 
out,"  which  can  improve  splines  with  fixed  evenly  spaced  knots.   Allowing  for 
estimated  knots  (e.g.  via  smoothing  splines,  as  in  Wahba,  1984)  is  known  to 
lead  to  more  accurate  estimates,  but  is  outside  the  scope  of  this  paper. 

The  theory  to  follow  uses  B-splines,  which  are  a  linear  transformation  of 
the  above  basis  that  is  nonsingular  on  [-1,1]   and  has  low  multicollinearity. 
The  low  multicollinearity  of  B-splines  and  recursive  formula  for  calculation 
also  leads  to  computational  advantages;  e.g.  see  Powell  (1981). 

Series  estimates  depend  on  the  choice  of  the  number  of  terms  K,   so 
that  it  is  desirable  to  choose  K  based  on  the  data.   With  a  data-based 
choice  of  K,   these  estimates  have  the  flexibility  to  adjust  to  conditions 
in  the  data.   For  example,  one  might  choose  K  by  delete  one  cross 
validation,  by  minimizing  the  sum  of  squared  residuals  E- _i ^V- "^.-if^^s ^ 1  > 
where  h  .  (x. )   is  the  estimate  of  the  regression  function  computed  from  all 
the  observations  but  the  i   .   Some  of  the  results  to  follow  will  allow  for 
data  based  K. 
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3.    Regularity  Conditions 

This  section  lists  and  discusses  some  fundamental  regularity  conditions 
on  which  all  the  following  results  are  based.   The  first  Assumption  limits 
dependence  of  the  observations. 


Assumption  3.1:   {(y,,q.))   is  stationary  and  a-mixing  with  mixing 
coefficients  a(t)  =  0U~^),      U  =  I,    2,    ...),      for  ^l  >  2. 


The  stationarity  assumption  could  be  relaxed,  but  is  not  done  so  in  order  to 
keep  the  notation  as  simple  as  possible.   The  results  will  also  make  use  of 

moment  conditions  on  y.   Let  u  =  y  -  h  (q),   u  =  y  -  h  (q.).   Also,  for  a 

1/2 
matrix  D   let   IIDII  =  [trace(D'D)]    ,   for  a  random  matrix  Y,   |Y|   = 

V   1/v 
{E[IIYII  ]>    ,   V  <  00,   and   |Y|    the  infimum  of  constants  C  such  that 

00 

ProbCllYII    <   C)    =   1. 

2 

Assumption  3.2:   |u. |    is  finite  for  s  i  2   and  E[u.|q.]   is  bounded. 

IS  11 

The  bounded  second  conditional  moment  assumption  is  quite  common  in  the 
literature  (e.g.  Stone,  1985).   Apparently  it  can  be  relaxed  only  at  the 
expense  of  affecting  the  convergence  rates  (e.g.  see  Newey,  1990),  so  to  avoid 
further  complication  this  assumption  is  retained. 

Assumption  3.3:   Either  a)   z.   is  uniform  mixing  with  mixing  coefficients 
(pit)    =  0(.t~^),       (t  =  1,  2,  ...),   for  M  >  2  or;   b)  there  exists   c(t)   such 
that   |E[u^u^^^|q^.q^^^]  I  ^   c(t)   and  E^"iC(t)  <  oo. 

This  assumption  is  restrictive,  but  covers  many  cases  of  interest,  including 
a  dynamic  nonparame trie  regression  with  h  (q. )  =  E[y. |q. , y. _, q. _., y. _„,...] . 

\J  X  XXXXX.LX  ^^ 

K  K   - 
The  next  Assumption  is  useful  for  controlling   (p  'p  /n)  . 
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Assumption  3.4:   For  the  support  Q  of  q.   i)  For  each  K  there  is 

K  K 

{P   (q)}     and  a  nonsingular,  constant  matrix  A^  with  p  (q)'  = 

(P^j,(q) ^KK^'^^^'Sc  ^°^  ^^^  q  €  Q;   ii)  There  is  <q(K)   such  that 

maXj^^j,|Pj^(q^)|  £   Cq^K);   iii)  For  P^(q)  =  (P^^(q) ,  .  .  .  .  P^^Cq) )'   there  is  a 

probability  measure  P  with  P(q.eQ)  a  cP(q  €Q)   for  any  measurable  set  Q  Q 

K    K 
Q  and  the  smallest  eigenvalue  of  X  P  (q)P  (q)'dP(q)   is  bounded  away  from 

zero  as  K  — >  oo. 

K  K 

The  bounds  in  ii)  give  a  convergence  rate  for  p  'p  /n,   while  iii)  controls 

its  singularity.   Hypothesis  iii)  is  essentially  a  normalization,  that  loads 
all  of  the  restrictions  onto  ii).   Without  this  type  of  normalization  the 
second  moment  matrix  can  be  ill-conditioned,  leading  to  technical 

difficulties.   For  example,  if  Q  =  R,   q   is  uniformly  distributed  on   [0,1], 

k-1  K    K 

and  p,j,(q)  =  q   ,   then  E[p  (q)p  (q)']  =  [cr..],      <r       =   l/(i+j-l),   which 

KJs.  1 J       1 J 

has  a  smallest  eigenvalue  that  goes  to  zero  faster  than  K  factorial. 

One  approach  to  verifying  this  assumption  is  to  find  a  lower  bound  \(K) 

K    K 
on  the  smallest  eigenvalue  of  E[p  (q)p  (q)'],   and  then  let  P   (q)  = 

-1/2 
p  (q)[X(K)]     ,  as  in  Newey  (1988a)  for  power  series.   Another  approach,  is 

to  let  P,^(x)  be  a  transformation  that  is  orthonormal  with  respect  to  some 

density  and  assume  that  the  true  distribution  dominates  the  one  corresponding 

to  that  density,  and  to  use  known  bounds  for  orthonormal  functions,  as  in 

Cox  (1988),  Newey  (1988b),  and  Andrews  (1991)  for  power  series.   A  third 

approach  to  find  a  transformation  that  is  well  conditioned,  though  not 

orthogonal,  as  for  B-splines  in  Agarwal  and  Studden  (1980)  and  Stone  (1985). 

The  next  Assumption  specifies  the  way  in  which  the  number  of  terms  is 

allowed  to  depend  on  the  data. 
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Assumption  3.5:   K  =  K(z, z  , n)   such  that   i)  There  are  K(n)  ^  K(n) 

In  — 

with  K(n)  5  K  :s  K(n)   with  probability  approaching  one;  ii)   p-   (q)   is  a 

subvector  of  p  (q),   which  is  a  subvector  of  p    (q),  for  all  K(n)  i  K  s 
K(n). 


That  is,   K   is  allowed  to  be  random,  but  must  lie  between  nonrandom  upper 
and  lower  limits  with  probability  approaching  one,  and  the  approximating 
functions  must  be  nested.   Nonrandom  K  is  included  as  a  special  case  where  K 
=  K.   These  upper  and  lower  limits  control  variance  and  bias,  respectively 
(the  larger  is  K   the  less  bias  there  is  but  the  more  variance).   It  would  be 
interesting  to  derive  such  upper  and  lower  limits  for  specific  choices  of  K 
(e.g.  cross-validation),  but  these  results  are  beyond  the  scope  of  this 
paper. 

Some  results  below  will  require  that  the  transformed  approximating 
functions  are  also  nested: 

Assumption  3.6:   Assumption  3.4  is  satisfied  and  for  P  (q)   of  Assumption 
3.3,   P—   (q)   is  a  subvector  of  P  (q),   which  is  a  subvector  of  P    (q), 
for  all  K(n)  rs  Yi  :s   K(n). 

This  Assumption  is  satisfied  for  power  series  but  not  for  splines,  so  that  the 
following  spline  results  are  limited  to  the  nonrandom  K  case. 

The  next  Assumption  imposes  a  rate  condition  for  convergence  of  the 
second  moment  matrix.   Let  K(n)  denote  the  number  of  elements  of 
p'^    (q)P    (q)'   that  are  nonzero  for  some  q  e  Q,   and  for  notational 
convenience  suppress  the  n  argument  in  K,   K,   K,   and  K  henceforth. 

Assumption  3.7:   KCq(K)  /n  >  0. 
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This  is  a  side  condition  that  will  be  maintained  throughout.  It  limits  the 
growth  rate  of  K  in  a  way  that  may  be  nonoptimal  in  the  mean-square  error 
sense,  as  discussed  in  Sections  6  and  7,  although  it  is  weaker  than  similar 
side  conditions  imposed  by  Cox  (1988)  and  Andrews  (1991). 

The  bias  of  these  estimators  depends  on  the  error  from  the  finite 
dimensional  approximation.   Sobolev  norms  will  be  used  to  quantify  this 
error.   For  a  measurable  function  f (q)  defined  on  Q,   let 

|f(q)|^^^  .  -^U|.d'^'^(^i^lv  =  -X,^,^^<E[|a^f(q.)r]}l/^ 

|f(q)|^^^  =  '"^^lAl^d'^^^^qeo'^^^^''^'- 
The  norm   if(q)|  ,    will  be  taken  to  be  infinite  if  d   f(q)  does  not  exist 

Q,  V 

for  some   |X|  ^  d.   Inclusion  of  derivatives  in  these  norms  will  be  useful  for 
deriving  properties  of  9  h(q). 

Many  of  the  results  will  be  based  on  the  following  polynomial 
approximation  rate  hypothesis: 

Assumption  3.8:   For  each  class  of  functions  ?,   there  exists  C  =  C(?,d,v) 
and  a  =  a(?,d,v)   such  that  for  all  f  6  ?, 


'^'n^R^^'^^^   -   P^'^'^^'^ld.v  ^  ^" 


■a 


This  condition  is  not  primitive,  but  is  known  to  be  satisfied  in  many  cases. 
Primitive  conditions  for  power  series  and  splines  are  given  in  Sections  6  and 
7. 

In  order  for  the  same  bias  bounds  to  apply  to  estimated  functionals  of 
interest,  it  is  necessary  that  A(h)  be  continuous  with  respect  to  the  same 
same  norm  as  in  Assumption  3.8,  which  is  imposed  in  the  following  condition. 
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Assumption  3.9:   For  the  d  and  v  of  Assumption  3.8,   A(h)   is  a  continuous 

linear  functional  with  respect  to  the  Sobolev  norm   |h|  ,   ,   i.e.  there  is  C 

d,  V 

such  that  for  all   h  e  K,   IIA(h)ll  s  C|hk   . 

d,  V 


4,    Convergence  Rates 

This  Section  gives  mean-square  convergence  rates  for  h(q)   and  uniform 
consistency  rates  for  its  derivatives  and  continuous  linear  functionals.   The 
results  include  both  sample  and  population  mean-square  error  rates. 

Theorem  4.1:      If  and  Assumptions  3.1    -3.5  and  3.7  are  satisfied  for     d  =  0, 
and   3=  =  {h},      then 

lj^^[h(q.)-h(q.)]^/n  =  0  (K/n   +  K"^"f j:^,^^^^^/^;""'';^''''; 


If  Assumption  3.6   is  also  satisfied,    then  for   the  CDF     F(q)      of     q., 
S[h(q)-h(q)]^dF(q)   =  0   (K/n   +  k'^"" {l^^^^^CK/K)'"^^ }^^'^ ). 


The  two  terms  in  the  convergence  rate  essentially  correspond  to  variance  and 

2a, „     ,„  ,,,,-av,  2/v 


b 


ias.   The  bias  term,  which  is  K   iY^j.^^r^^i^'^)        )    .   will  be  equal  to 


K    if  K  =  K,   and  for  v  >  1/a   is  bounded  above  by  K       .A 
consequence  of  the  second  conclusion  is  convergence  rates  for  some  version  of 
the  additive  components,  because  Ti     closed  implies  that  the  mapping  from 
h(q)   to  some  decomposition  into  additive  components  is  mean-square  continuous 
(see  Bickel  et.  al. ,  1990,  Appendix). 

Uniform  convergence  rates  depend  on  bounds  for  the  derivatives  of  the 
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series  terms. 

Assumption  4.1:   For  each  k  s  K,   P  j,(q)   is  differentiable  of  order  p  and 

for  all  multi-indices  A  with  \\\    s  p  there  is  Ci-ki(K)   such  that  with 

1 A I 

probability  one  maXj^^j,!  aV^^Cq.  )  I  i  Cj^l  (K)   and  C|;^|  (K)  i  Cq^K)  , 

Theorem  4.2:      If  Assumptions  3.1   -  3.8  are  satisfied,    for     d  =    \X\    s  p,  v  = 
00,  and  9^  =  {h},  and  Assumption  4.  1   is  satisfied,    then 

sup      .\dh(q)    -   ^h(q)\    =  0   (K'^^^C,..,(K)l(K/n)^^^    +  K~";;. 
qev  p  I A I 

The  uniform  convergence  rate  for  h(q)   is  slower  than  the  mean-square  rate 
and  does  not  attain  Stone's  (1982)  bounds,  although  it  is  faster  than 
previously  derived  rates  for  series  estimators,  as  further  discussed  below. 

Convergence  rates  for  continuous  linear  functionals  of  h(q)  will  follow 
from  this  result. 

Theorem  4.3:      If  Assumptions  3.1   -  3.9  and  4.1   are  satisfied  for     v  =  a>     and 
C,.^.(K)      is  monotonically  increasing  in      |\|,  then     A(h)-A(h^)  = 

0  (K^^^t^XK)i(K/n)^^^  +  k"";;. 

p  d 

The  implied  convergence  rate  for  mean-square  continuous  linear  functions  is 
not  sharp,  as  they  are  shown  to  be  Tn-consistent  (under  slightly  stronger 
conditions)  in  Section  5. 
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5.   Asymptotic  Normality 

An  estimator  of  the  asymptotic  variance  of  An  can  be  formed  from  the 
usual  estimator  of  the  asymptotic  variance  of  the  projection  coefficients  n. 
The  asymptotic  normality  result  below  will  require  that  the  products  of 
elements  of  H     and  the  residual  be  martingale  differences,  so  that  no 
autocorrelation  correction  is  required.   Let 

t   H  p^'p*^/n.   V  =  I.;^p^(q.  )p^(q.  )'  [y. -h(q.  )  ]^/n. 

Treating  K  as  fixed,  the  White  (1980)  estimator  of  the  asymptotic  variance 

of  the  projection  coefficient  estimator  tt  is  Z  VZ  .   Since  Att  is  a  linear 

combination  of  n,      a  corresponding  estimator  of  the  asymptotic  variance  An 
is 

n  =  A'z:~vt~A/n. 

This  estimator  is  consistent  as  K  grows,  under  conditions  to  follow. 

Further  conditions  are  useful  for  asymptotic  normality  of  A(h)   and 
consistency  of  Q.      Let 

Z  =  E[p^(q.)p^(q.)'].   V  =  E[p^(q.)p^(q.)'u^].   Q  =  A'Z"^VZ"Vn. 
11  111 

Assumption  5.1:   Assumptions  3.1  -  3.2  are  satisfied,  with  s  >  4/i/(/i-l), 

2 

and   i)  E[u. |q.]   is  bounded  away  from  zero;  ii)  For  any  h(q)  e  H, 

E[h(q. )u. lz._  , z. __,... ]  =  0;   iii)  K  =  K(n)   is  nonrandom. 

Fart  ii)  of  is  the  martingale  difference  assumption:  it  holds  if  the 
observations  are  independent  or  h  (q. )  =  E[y. I z. _, z. _,...] . 
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Assumption  5.2:   A  has  full  column  rank  for  some  K  and  for  all  K  there  is 
a  nonsingular  matrix  C^  such  that  for  all   J  ^  K,   C  P  (q)   does  not  depend 
on  K. 


Part  ii)  of  this  hypothesis  rules  out  asymptotic  linear  dependence  among 
different  components  of  A(h).   When  A(h)   is  mean-square  continuous  in  h, 
a  primitive  condition  for  this  hypothesis  is  that  A(h)   is  onto  R  ,   as 
discussed  below. 

Assumption  5.3:   Assumption  3.4,  3.8,  and  3.9  are  satisfied  for  v  ^  2,   K  £ 
K,   KK<Q(K)Vn  — >  0,   and  vQC~"  -^   0. 

The  second  condition  requires  essentially  that  the  bias  converges  to  zero 
faster  than  1/Vn   (see  Assumption  3.8),  which  is  stronger  than  the  natural 
condition  that  the  bias  go  to  zero  faster  than  the  variance. 

Theorem  5.1:      If  Assumptions  5.1   -5.3  are  satisfied   then 

n~^^^[A(h)  -  A(h)]  -^  N(o.i),    dr^^^ih  -  qjq'^^^'   -^  o. 


Furthermore,    if   there  exists  a  scalar     ip     >  0     and  nonsingular     n_  such  that 

\b  Q   — >  n^,      then 
^n  0 

il^^lACh)  -  A(h)]   -S  N(O.n^),      ipjl  -^  n^. 


This  result  improves  on  Andrews  (1991)  in  applying  to  estimators  of 
projections  other  than  conditional  expectations,  allowing  for  dependence,  and 
having  a  faster  growth  rate  for  K,   but  restricts  K  to  be  fixed:   It  is 
more  diffcult  to  allow  for  random  K  with  dependent  observations. 

The  second  conclusion  is  useful  for  forming  asymptotic  Gaussian 
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confidence  intervals  in  the  usual  way.   If  the  hypotheses  of  Theorem  4. 3  are 

satisfied  and  K^^'^C^CK)  [  (K/n)  ^'^^  +  k""']  — >  0,   so  that  A(h)   is  a 

consistent  estimator  of  A(h),   the  delta-method  can  be  used  to  make  inference 

about  smooth  nonlinear  functions  of  A(h)   in  the  standard  way.   The 

hypothesis  about  i/»   is  not  restrictive  when  A(h)   is  a  scalar,  where  ili     = 

n  ^n 

1/2 
n     will  satisfy  the  hypotheses.   However,  when  A(h)   is  a  vector,  it 

requires  essentially  that  the  variance  of  each  component  of  A(h)   converge 

to  zero  at  the  same  rate,  which  may  not  be  true  when,  e.g.,   A(h)   includes 

both  h(q)   and  its  derivatives  at  a  single  point.   It  is  possible  to  derive  a 

primitive  condition  for  i//  =  n,   corresponding  to  Vn-consistency  of  A(h), 

which  is  stated  in  the  following  result. 

Theorem  5.2:      Suppose   that    i)   Assumptions  5.1   and  5.3  are  sat isfied  for     d  = 

0     and     V  =  2;      ii)   for  any     h(q)   €  K  there  exists     n       such   that 

K 

E[{h(q)-p   (q)'n   }    ]   — >  0     as     K   — >  oo;  iii)    there  exists  an     s  x   1      vector 
8(q)      of  elements  of     H     such   that     E[5(q)S(q)' ]      exists  and   is  nonsingular, 
and     A(h)   =  E[S(q)h(q)]      for  all      h  e  K.  Then  for     Q^  =  E[a-^(q)6(q)5(q)'  ] 

VE[A(h)   -  A(h)]  -^  N(O.Q^),      nn   -^  n^. 


Hypothesis  ii)  is  the  minimal  mean-square  spanning  requirement  for  consis- 
tency of  the  sample  projection.   By  the  multivariate  Riesz  representation 
theorem  (e.g.  see  Hansen,  1985),  hypothesis  iii)  is  equivalent  to  the 
statement  that  A(h)   is  mean-square  continuous  and  has  range  R  . 
Furthermore,  mean-square  continuity  of  such  linear  functionals  is  a  necessary 
condition   for  a  finite  semiparametric  variance  bound  for  A(h),   as  in  Stein 
(1956),  and  hence  for  existence  of  a  (regular)  v'n-consistent  estimator 
(Chamberlain,  1985),  so  that  mean-square  continuity  of  A(h)   characterizes 
the  v'n-conslstent  case. 
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Theorem  5.3:      If     A(h)      is   a  scalar  functional    that   is  not   mean-square 
continuous  then  the  semiparametric  variance  bound  for     A(h)     is  infinite. 

Theorem  5.2  can  be  specialized  to  many  interesting  examples,  including 
the  parameters  of  the  finite  dimensional  component  of  the  projection  and 
average  derivatives.   As  noted  in  equation  (2.4),  the  parameters  3  of  h(q) 
=  q'^p  -i-  h^{q^)      satisfy  P  =  E[5(q)h(q)  ]   for  5(q)  =  m"-^  [q^-P(q^  IW^)  1  • 
Furthermore,  the  mean-square  spanning  hypothesis  of  ii)  will  be  satisfied  as 
long  as   (p  ^    (q  ), . . . ,p   (q  ) )'   span  H   ,      giving  the  following  result: 

Theorem  5.4:      Suppose   that    i)   Assumptions  5.1   and  5.3  are  satisfied;    ii)  for 
any     h   (q  )   e  K        there  exists     ^\i(^fc>i(>        such   that 

E[{q^-P(q^\n^)}{qj^-P(q^\n^)}' ]     is  nonsingular.      Then  for     n^  = 
M~h[a-^(q){q^-P(q^\n^)}{{q^-P(q^  \}<^)}]M~\ 

Vn(^  -  p^)  -^  N(o,n^).     nn  -^  n^. 


Sample  projection  estimators  of  p  have  been  previously  analyzed  by 
Chamberlain  (1986),  Andrews  (1991),  and  Newey  (1990),  but  only  under 
q'p  +  h„(q„)  =  E[y|q],   an  unrestricted  functional  form  for  h„(q„)   (e.g. 

h  (q„)   could  not  be  additive),  and  independent  observations.   One  implication 

2 
of  this  result  is  that  if  E[y|q]  =  q'p  +  h  (q  )  and  o-  (q)  =  Var(ylq)   is 

constant,  then  an  estimator  that  imposes  additivity  h  (q  )  will  be 

asymptotically  more  efficient  than  one  that  does  not:   The  asymptotic  matrices 

are  <T^iE[{q^-Piq^\n^)}{q^-P(q^\n^)}' ])''^     and 

2  -1 

0-  (E[{q  -E[q  Iq  ] >{q  -E[q  Iq  ] >' ] )    respectively,  which  have  a  positive 

semi-definite  difference.   Thus,  although  imposing  additivity  does  not  improve 

the  convergence  rate  of  p,   it  can  lower  its  asymptotic  variance. 
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Theorem  5.2  can  also  be  specialized  to  the  average  derivative  functional 

of  equation  (2.4)  for  certain  weights.   If  there  are  no  boundary  terms  then 

integration  by  parts  gives 

X  |A  I    A 

A.(h)  =  S   w.(q)a  -^h(q)dq  =  (-1)   ^  X  3   -'w(q)h(q)dq 
^  Q  -^  Q 

|A  I    A 
=  E[5  (q)h(q)],   5  (q)  =  (-1)   ■"  P(a  ^w(q)/f  (q)  |  K)  (q) . 

2 
where  f(q)   is  the  density  of  q.   Here,   E[5.(q)  ]   will  be  finite  if   if 

A        ^ 

f(q)   is  not  too  small  relative  to  9  w(q).   Let   S(q)  =  (5  (q) 5  (q))'. 

Following  Stoker  (1986),  the  previous  integration  by  parts  will  be  valid,  and 

hence  A(h)  mean-square  continuous,  under  the  hypotheses  of  the  following 

result. 


Theorem  5.5:      Suppose   that    i)  Assumptions  5.1   and  5.3  are  satisfied;      ii)  for 

~  ~  K  2 

any     h(q)   €  K  there  exists     n       such   that   E[ {h(q)-p   (q)'n   }    ]   — >  0  as     K   — > 

K  K 

oo;   iii)  Q     Is  convex  with  nonempty   Interior,      w  .(q)      Is  continuously 
different  lab le   to  order      |A.|  on     Q     and     8  w  .(q)   =  0     on  the  boundary  of     Q 
for  all      |A|  s.    |A.|,  E[S(q)S(q)'  ]     exists  and  is  nonsingular.      Then  for     Q_  = 
E[<T^(q)5(q)a(q)'  ]. 

Vn(A(h)   -  A(h^))   -^  N(0,n^),      nh   -^  Q^. 


Primitive  conditions  for  these  results  for  power  series  and  splines  are  given 
in  the  following  Sections. 
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6.    Power  Series 

This  Section  gives  primitive  conditions  for  consistency  and  asymptotic 
normality  of  projections  that  use  power  series.   Throughout  both  this  and 
the  next  Sections,  it  will  be  assumed  that  H     is  as  specified  in  equation 
(2.2),  and  the  following  hypothesis  is  satisfied: 

Assumption  6.0:   i)  For  each  q  „,   (£  =  1 L),   if  q„  is  a  subvector 

of  l     then  q_  =  q_„,   for  some  I';      ii)  There  exists  a  constant  c  >  1 
such  that  for  each  i,      with  the  partitioning  q_  =  iqy'p.q.   '.)' ,      for  any  a(q) 
>  0,   cJ"a(q)d[F(q2^)-F(52£)]  ^  E[a(q)]  ^  c"Va(q)d[F(q2^) -FC^^^)  ] ;   iii) 


bounded  and  for  the  clos- 


Either  p  =  0   (i.e.   q   is  not  present)  or  q.   is 

ure  n^     of  {^^^h^^iq^py.    El{q^-P{.q^\}i^)}{{q^-P(.q^\K^)}']      is  nonsingular. 

Conditions  i)  and  ii)  are  sufficient  for   {Z._  h  .(q  .)}   to  be  closed. 
Boundedness  of  q   can  be  relaxed,  but  for  brevity  is  not  here. 

Power  series  estimators  will  be  mean-square  consistent  if  the  regressor 
distribution  has  an  absolutely  continuous  component  and  the  K  grows 
slowly  enough;  see  Newey  (1988a).   To  obtain  convergence  rates  it  is  useful 
to  bound  the  regressor  density  below,  as  follows: 

1    2 
Assumption  6.1:   There  are  finite  q.  >  q.,  v.  ^  0,    (J  =  1 r)   such 

r   12 
that  the  support  of  q„,   is  fl._i[q..q.]   and  the  distribution  of  q    has 

absolutely  continuous  component  with  density  bounded  below  by 


Cfl-si  ^  ^'^  •"'' -^  ^^ -"^ -^  ^    °^  ^^^   support. 


It  is  also  possible  to  allow  for  a  discrete  regressor  with  finite  support,  by 
including  all  dummy  variables  for  all  points  of  support  of  the  regressor,  and 
all  interactions.   Because  such  a  regressor  is  essentially  parametric,  and 
allowing  for  it  does  not  change  any  of  the  convergence  rate  results,  this 
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generalization  will  not  be  considered  here. 

To  state  further  conditions,  let   {'V(k)>  denote  the  sequence  of 

multi-indices  used  in  defining  the  power  series  or  spline  interactions  in 

equations  (2.3)  and  (2.4)  respectively,  and  for  any  (other)  multi-index  A, 


^(k)  =  {j  :  X  (k)  *  0>,   J(k)  =  #^(k)  <  n, 


V   =  max,  r.  „,,  ,u./J(k),   A  =  max,  7 .  „,,   ,A./J(k). 
k^j€^(k)  J  k^j€^(k)  J 


For  power  series,  Assumption  3.7  follows  from 


_4+4t> 
Assumption  6.2:   K    /n  — >  0. 


-  -      1/4 

For  a  =  1   and  v  =  0,   this  condition  is  K  =  o(n   ),   which  is  weaker  than 

Cox's  (1988,  p.  715)   K  =  o(n   )   requirement. 

Primitive  approximation  rate  conditions  (as  in  Assumption  3.8)  for  power 

series  follow  from  known  results  of  Lorentz  (1986),  Powell  (1981),  or  a  Taylor 

expansion. 


Assumption  6.3:   Each  of  the  components  h   Aq.) ,       (£  =  1 L),   is 

continuously  differentiable  of  order  h     on  the  support  of  q.. 


This  hypothesis  implies  Assumption  3.8  for  v  =  oo,   with  d  =  0  and  a  =  h/n. 
and  with  a  =  h-d     when  t.  =  1.   A  literature  search  has  not  yet  revealed 

corresponding  conditions  for  d  >  0  and  a  >  1,   but  rates  for  this  case 

follow  from  a  Taylor  expansion  under  the  following  (strong)  condition: 

Assumption  6.4:   There  is  a  constant  C  such  that  for  each  multi-index  A, 

the  A    partial  derivative  of  each  additive  component  of  h(q)   exists  and 
is  bounded  by  C 
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The  first  power  series  result  gives  convergence  rates. 

Theorem  6.1:      Suppose   that   Assumptions  3.1   -3.3  and  6.0  -  6.3  are  satisfied. 
Then 

Y..'l^[h(q.)-h(q.)]^/n  =  0  (K/n   +  K^^''). 

S[h(q)-h(q)]^dF(q)   =  0  (K/n   +  K'^^"^ ) 

sup  ^^\h(q)-h(q)\    =  0  (K^''^{[K/n]^^^   +  K^^"^}) 

Suppose,    in  addition,    that  either  a)  n.  =  i,    X  <  h,    and  a  =  h-\,    or;    b) 
Assumption  6.4  satisfied  and     a     is  any  positive  number;      Then 

sup^^^\dh(q)-ah(q)\  =  0  (K^*''*^'^{[K/n]'^^^  +  k"*";;. 

This  result  implies  optimal  convergence  rates  for  power  series  estimators  of 
h(q)  when  K  goes  to  infinity  at  the  optimal  rate  and  Assumption  6.2  is 
satisfied.   If  the  density  of  q   is  bounded  away  from  zero,   K  =  en  ,   K  = 
Cn  ,   y  =  n/{2h+n.),      and  h  >  3n,/2,      the  mean-square  convergence  rate  for 
h(q)   is  n         ,   which  attains  Stone's  (1982)  bounds.   The  side 
condition  that  h  >  3n/2,      which  is  needed  to  guarantee  Assumption  6.7,  limits 
this  optimality  result,  but  is  weaker  than  the  corresponding  condition  in  Cox 
(1988).   These  mean-square  error  results  apply  to  additive  projections  (rather 
than  conditional  expectations),  like  Stone  (1985,  1990)  but  unlike  Cox  (1988) 
or  Andrews  and  Whang  (1990),  allows  for  interactive  terms,  similarly  to  Stone 
(1990),  (although  Stone  (1985)  also  derives  optimal  rates  for  derivatives), 
and  allows  for  dependent  observations.   The  side  condition  Assumption  6.2  is 
not  present  in  Stone  (1985)  or  Andrews  and  Whang  (1990),  but  it  implies  a 
population  mean-square  error  result,  unlike  Andrews  and  Whang  (1990).   In 
comparison  with  Cox's  (1988)  uniform  convergence  rate  of  K   /Vn  +  K    = 
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K  ( [K/n]    +  K   )   for  univariate  q  with  density  bounded  away  from  zero  (in 

1/2 
Cox's  notation,   h  =  1   and  k  =  2),   the  rate  here  is  the  faster  K([K/n] 

+  K   ),   and  uniform  convergence  rates  for  derivatives  are  given  here. 

To  state  the  asymptotic  normality  result  for  power  series,  let  Q     denote 

the  variance  matrix  estimator  described  in  Section  5,  for  power  series. 

Theorem  6.2:    Suppose   that    i)   Assumptions  3.1,    3.2,    5.1   -  5.2,    6.0,    6.1,      are 

satisfied  with     s  >  4iL/(y.-l),      IC        /n  — >  0;      ii)     A(h)      is  continuous  with 

respect    to   the  Sobolev  norm      \h\  ,     ,      either  a)   6.3   is  satisfied,      d  =  0,      and 

a,  00 

VnK     — )  0;    or  b)   Assumption  6.3   is  satisfied,      n.  =   1,      VnK  — >  0;    or  c) 

Assumption  6.4   is  satisfied  and     K/n     — >  oo  for  some     7  >  0;    Hi)    there  exists 

a  scalar     \h     >  0     and  nonsingular     Q„  such   that     lA  Q  — >  Q^.      Then 
n  "0  ^n  0 

ip^^'^^lACh)  -  A(h)]   -^  N(0,n^),      yph   -^  n^. 


In  comparison  with  Andrews  (1991),  this  result  applies  to  projections  other 
than  the  conditional  expectation,  or  allows  for  dependent  observations,  and 
has  weaker  growth  rate  conditions  for  K:   if  v  =  0   then  i)  requires  K  = 
o(n   )   while  Andrews  (1991)  requires  K  =  o(n   ).   For  d  =  0   i)  and  ii) 
imply  that  h   >  5T./2   (e.g.   h(q)   is  thrice  continuously  dlf ferentiable  when 
n.  =   1). 

This  result  can  be  applied  to  estimation  of  the  components  of  an  additive 
projection  and  their  derivatives,  when  the  observations  are  independent. 
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2 

Theorem  6.3:      Suppose   the  observations  are   independent,      a-  (q)      is  bounded 

and  bounded  away  from  zero,       |u|   <  oo  for     s  >  2,      Assumptions  6.0  and  6.1 

are  satisfied  for     v  .  =  0,      each     h  Xq  .)     is  continuously  different iable  to 

J  J     J  ^ 

order     h     on     Q,      K  =  o(n       ),      and     VnK       — >  0.      Then  for  any  pair  of  points 
q  .,      q  .     in   the  support   of     q  .,      ( J  =   1.    ....  r), 

tr^^^lChrq.)   -  hXq.)}    -    (h.^Cq.)   -  h.^Cq.)}]   -^  N(0,1). 

Also,    if     VnK  — >  0,      then  for  any     q .     in  the  support  of     q  ., 

Q~^^^[a^h  Xq  .)/dq^.  -   a'^h  .Jq  .)/dq^.]   -^  U(0,  1). 


The  differencing  normalization  here  is  different  than  the  mean  centering  in 
in  Stone  (1985),  which  would  be  more  difficult  to  work  with. 

A  v'n-cbnsistency  and  asymptotic  normality  result  for  power  series 
estimates  of  mean-square  continuous  linear  functionals  is: 

Theorem  6.4:    Suppose  that  Assumptions  5.1,    6.0,    6.1,    are  satisfied  and 
¥r       /n   — >  0;  ii)  Assumption  6.3  is  satisfied,    and     VnK  — >  0;  Hi)   there 

exists  an     s  x   1      vector     d(q)     of  elements  of     H  such   that   E[S(q)S(q)'  ] 
exists  and  is  nonsingular,    and     A(h)  =  E[S(q)h(q)]     for  all     h  €  H.      Then  for 
n^  =  E[<r^(q)5(q)5(q)'  ], 

Vn[A(h)   -  A(h)]   -^  N(O.n^),      nf2  -^  n^. 

This  result  can  be  specialized  to  the  parameters  of  a  finite  dimensional 
component  and  average  derivatives,  as  follows. 
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Theorem  6.5:    If  hypotheses   i)   and   ii)   of  Theorem  6.5  are  satisfied   then  for 
n^  =  M~^E[a-^(q){q^-P(q^\n^)}{{q^-P(q^\H^)}'  ]M'\ 

Vn(^  -  ^q)  -^  N(O.^q).      r)fl  -^  n^. 


This  result  gives  fully  primitive  regularity  conditions  for  ^-consistency  and 
asymptotic  normality  of  a  power  series  estimator  of  the  parameters  of  a  finite 
dimensional  component  of  a  projection.   It  allows  for  h  (q  )   to  have  the 
additive  form  discussed  above,  and  also  allows  for  dependent  observations.   An 
analogous  result  can  be  given  for  weighted  average  derivatives,  although  for 
brevity  such  a  result  is  only  given  below  for  splines. 


7.    Splines 


Results  for  splines  are  limited  to  the  case 


Assumption  7.1:   Assumptions  6.0  and  6.1  are  satisfied  with  v .  =  0,   (j  =  1, 
....  J). 


Splines  allow  for  a  faster  growth  rate  for  the  number  of  terms. 

3 
Assumption  7.2:   K  /n  — >  0. 


Approximation  rate  conditions  for  a  =  1  or  d  =  0  follow  from  known 
results,  but  a  literature  search  has  not  yet  revealed  conditions  for  other 
cases,  which  limits  the  following  results. 
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Theorem  7.1:      Suppose   that     a  =   1      or     d  =  0,      Assumptions  3. 1    -   3.3,    7.1, 
7.2,    and  6.3   are  satisfied,    and     m  ^  h  -  1.      Then  for      |A|  ^  m, 

Z.'^JMq.)-h(q.)]^/n  =  0  (K/n   +  K^^^"^), 
^1=1  lip 

S[h(q)-h(q)]^dF(q)   =  0  (K/n  +  K^^"^) 

sup         \h(q)-h(q)\    =  0   (K(lK/nl^^^   +  K^^"^}). 

If,    in  addition     n.  =   1,      then 

sup^^^\a^h(q)-ah(q)\    =  0   (K^'^'^dK/n]^^^   +  K^*^}). 

This  result  yields  optimal  mean-square  convergence  rates  for  spline  regression 

estimation  of  an  additive  projection  with  dependent  observations,  if  K  = 

n        :   here  the  side  condition  of  Assumption  7.2  is  satisfied  if  h  >   \. 

Throughout  the  rest  of  Section  7,  let  Q.     be  the  variance  estimator 
computed  as  described  in  Section  5,  using  splines. 

Theorem  7.2:    Suppose  that  n.  =  1     or     d  =  =  0;    i)  Assumptions  5.  1   -  5.2,    7.1, 

4 
are  satisfied  and     K/n   — >  0;  ii)     A(h)     is  continuous  with  respect   to  the 

Sobolev  norm      \h\  ,  for     d  £  m.      Assumption  6.3   is  satisfied  for     m   ^  h-1, 

d,  00 

and  VnK       —^0;    Hi)   there  exists  a  scalar     ijj     >  0     and  nonsingular     n^  such 

that     i/(  n  — >  n„.      Then 
n  0 

il,^^''^[A(h)   -  A(h)]   -^  N(0,n^).      ipjl   -^  Qq. 

Apparently,  there  are  no  other  asymptotic  normality  results  for  spline 

1/4 
projections  in  the  literature.   The  growth  rate  K  =  o(n   )   is  smaller  than 

K  =  o(n   )   for  power  series,  so  asymptotic  normality  will  require  only  twice 

differentiability  of  h(q)   rather  than  the  thrice  differentiability  for  power 
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series.   This  result  can  be  specialized  analogously  to  Theorem  6.3,  although 
for  brevity  this  specialization  is  omitted. 

A  Vn-consistency  and  asymptotic  normality  result  for  spline  estimators  of 
mean-square  continuous  linear  functionals  is: 

4 
Theorem  7.3:    Suppose  that  Assumptions  5.1,    6.1,    are  satisfied  and     K  /n   — >  0; 

ii)   Assumption  6.3   is  satisfied,      m   2  h-1      and     VnK       — >  0;  Hi)    there  exists 

an     s  X   1      vector     S(q)     of  elements  of     H     such   that     E[S(q)S(q)' ]      exists 

and   is  nonsingular,    and     A(h)   =  E[S(q)h(q)]      for  all      h  s  H.      Then  for     n^  = 

E[(r^(q)S(q)S(q)'] 

Vn[A(h)   -  A(h)]   -^  N(0,n^),      nh   -^  Q^. 


This  result  can  be  specialized  to  the  parameters  of  a  finite  dimensional 
component  and  average  derivatives,  as  follows.   For  brevity,  only  the  average 
derivative  result  is  given  here.   Let  q_ .  denote  the  vector  of  all  the 
components  of  q  other  than  the   jth,   and  f(q.|q_.)   the  conditional 
density  of  the  jth  component  given  the  others. 

Theorem  7.4:    Suppose   that   hypotheses   i)   and   ii)   of  Theorem  7.3  are  satisfied, 
for  some  integer     d  >  0,      d  ^  m,      w  .(q)      is  continuously  different iable   to 
order      |A.|  on     Q     and     d  w  .(q)   =  0     on   the  boundary  of     Q     for  all      \\\    ^ 
|A.|,  h  .^(q  .)      is  continuously  different  iable  to  order     d,      and 
\\[d'^w(q  .)/dq'^.]/f(q  .\q    .;il„<oo.  Then 

Vn[S  w(q  .)[d^h.(q  .)/dq^.]dq  .   -  S  w(q  .)  [d^h      (q  .)/dq'^.]dq  .] 

[-1,1]        ->  J      J  J        J  [-1,1]        -'     J^      J  J        J 


N(0,E[a-(q)   S(q)    ]),         nQ   -^  E[a-(q)    8(q)    ] . 
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8.    Proofs  of  Theorems 

The  proofs  of  Sections  8  and  9  are  abbreviated,  with  details  provided 

only  for  central  or  potentially  unfamiliar  results.   A  longer  version  of  these 

sections  is  available  from  the  author  upon  request.   Throughout,  let  C  be 

a  generic  positive  constant  and  \   .    (B)  and  X        (B)  be  minimum  and 

min  max 

maximum  eigenvalues  of  a  symmetric  matrix  B.   The  following  Lemmas  are  useful 
in  proving  the  results  for  power  series  and  splines. 

Lemma  8.0:      If  Assumption  6.1    i)   and   11)   are  satisfied,    then     {Zp_  h   „(q  .): 

~   ~   2 
E[h^„(q  j)   }   <  m,    i  =   1,    . . . ,    L}      Is  closed   In  mean-square.      If   In  addition 

Assumption  6.1   ill)   is  satisfied  then  H  is  closed. 


Proof:   For  now,  let  K„  =  {S„  h  „(q  .)},   and  for  nontational  convenience 
drop  the  2  subscript.   By  Proposition  2  of  Section  4  of  the  Appendix  of 
Bickel,  Klaasen,  Ritov,  and  Wellner  (1990),   K  closed  is  equivalent  to 
existence  of  a  constant  C  such  that  for  each  h  e  Jf  there  is  h.(q.)  with 
llhll   ^  Cmax-{llh„ll  >.   Existence  of  such  a  C  can  be  shown  using  an  induction 
argument  like  that  of  Stone  (1990,  Lemma  1,  "L„  Rate  of  Convergence  for 
Interaction  Spline  Regression,"  Tech.  Rep.  No.  268,  Berkeley).   As  noted 
there,  assuming  that  this  property  holds  for  each  maximal  dimension  less 
than  a,      for  each  h  e  K,   there  is  a  unique  decomposition  h  =  Y.p-A^p^'^p^  > 
such  that  for  all  q-   that  are  strict  subvectors  of  q„,   E[h„(q.)5(q„  )  ]  =  0 
for  all  measurable  functions  of  q„   with  finite  mean-square.   Then,  following 
Stone  (1990),  it  suffices  to  show  that  for  any  "maximal"   q„,   that  is  not  a 
proper  subvector  of  any  other  x»,   that  there  is  a  constant  c  >  1  such  that 

E[h(q)^]  i  c~^E[h^(5^)^]. 
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To  show  this  property,  note  that  that  holding  fixed  the  vector  of  components 

-^c  -^  « 

q„   of  q   that  are  not  components  of  q„   each  i  *  k,      h.(q.)   is  a  function 

of  a  strict  subvector  of  q„.   Then, 

E[h(q)^]  s  c~^S{\iqf)    +  E^^^h^(5^)  >^dF(5^)dF(5^) 
=  c"V[X<h^(5^)  +  j:^^^h^(5^)>2dF(5^)]dF(5^) 
=  c"V[J'<h^(q^)2  +  {5:^^^h^(5^)>2}dF(5^)]dF(q^) 
2=  c"V[Jh^(q^)^dF(5^)]dF(q^)  =  c~^E[h^(5^)^] . 


To  show  the  second  conclusion,  now  add  back  the  2  subscript,  and  let  H 

=  {q'^  +  h„(q_)  :  h„  €  K_>.   Consider  a  sequence  h .  e  K,   h.  — >  h.   in  mean 
^1"^    2  ^2     22  ^        J        jO 

square.   To  show  h  e  H,      note  p.   is  a  mean-square  continuous  function  of 

h.,   so  that  by  h .   a  Cauchy  sequence,   p.   is  a  Cauchy  sequence,  and  hence 

converges  to  some  p  ,   and  hence  q!j3.   converges  to  '\\^r>.      iri  mean-square, 

and  hence  h_  .  =  h.-q'S.   converges  to  h.-q'S.,   which  is  an  element  of  K_ 
2j    J  ^1'  J        ^        0  ^1*^0'  2 

by  H       closed,  so  h  e  K.    ■ 

Lemma  8.1:      If   the  support     Q     of     q .      is  a  box  and   there   is     C     such    that     ^ 
=   {f(q):    each  additive  component      f(q)     of     f(q)      is  continuously 
different iable  of  order     ^     and     max    \d  f(q)\    ^  C}.      then  for  power  series, 
there   is     C  >  0     such   that   for  all      f  e  5,  inf     ^K\f(q)-p'^(q)'n\_,       <  CK'"'. 
for     d  =  0     and     a  =  ^/n.,      and  for     a  =   1,      (  <  d,      and     a  =  f-d. 


Proof:   First,  note  that  it  suffices  to  show  the  result  for  a  =  r,   since 
the  approximation  error  of  the  function  is  bounded  by  the  sum  of  errors  over 
all  additive  components.   For  the  first  conclusion,  note  that  by   |\(K)| 
monotonic  increasing,  the  set  of  all  linear  combinations  of  p  (q)  will 
include  the  set  of  all  polynomials  of  degree  QC     for  some  C  small 
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enough,  so  Theorem  8  of  Lorentz  (1986)  applies.   For  the  second  conclusion, 

let  Q  =  [q  ,q  ],  and  note  that  d   p    (q)/9q   is  a  spanning  vector  for  power 

series  up  to  order  K.   By  the  first  conclusion,  there  exists  C  such  that 

for  all   f  €  ?,   there  is  n       with  f^(q)  =  P    (q)'Tr   satisfying 

sup  |af(q)/aq  -  5fj,(q)/5q|  £   C-K  ''     .      The  second  conclusion  then  follows 

by  integration  and  boundedness  of  Q,   e.g.  for  d  =  1  and  the  constant 

coefficient  chosen  so  that  f(q,  )  =  f„(q,  ),   |f(q)  -  f(,(q)|  ^ 

1      K.   1  Jv 

S    ia'*f(q)/aq'*  -  a^f„(5)/aq|d5  ^  ck"^"'\   ■ 

Lemma  8.2:      If   the  support     Q     of     q.      is  star-shaped  and   there   is     C     such 
that     ^  =   {f(q):    each  additive  component     f(q)     of     f(q)      is  continuously 
different iable  of  all   orders  and  for  all   multi-indices     X,      max^\d  f(q)\    ^ 
C   },      then  for  power  series,    for  all     a,    d  >  0     there  is     C  >  0     such   that 
for  all     f  e  ?,  inf     „fi\f(q)-p'^(q)' ti\  ^       s  CK~°^. 

Proof:   As  above,  assume  without  loss  of  generality  (w. l.g. )  that  r  =  n.      By 
Q  star-shaped,  there  exists  q  e  Q  such  that  for  all   q  e  Q,  ^q  +  (l-3)q  €  Q 
for  all  0  s  p  £  1.   For  a  function  f(q),   let  P(f,m,q)   denote  the  Taylor 
series  up  to  order  m  for  an  expansion  around  q.   Note  5P(f , m, q)/aq  .  = 
P(af/aq.,m-l,q),   so  that  by  induction  aV(f,m,q)  =  PO'^f ,  m-1  A| ,  q).   Also, 
a  f(q)   also  satisfies  the  hypotheses,  so  that  by  the  intermediate  value  form 
of  the  remainder, 

max  ^j^la'^fCq)  -  PO^f ,  m- |  A|  ,  q)  |  ^  cVe  (m-d) !  ] . 

Next,  let  m(K)  be  the  largest  integer  such  that  P(f,m,q)   is  a  linear 
combination  of  p  (q),   and  let  fj^Cq)  =  P(f,m(K),q).   By  the  "natural 
ordering"  hypothesis,  there  are  constants  C   and  C   such  that  C,m(K)   s  K 
i  C  mCK)"",   so  that  for  any  a  >  0,   c'"^'^  V[  (m(K)-d) !  ]  £   CK"",   and 
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sup|^l^^  ^\d\(q)-dh^(.q)\    =   sup|^|^^^Q|a^f(q)-P(a^f.m(K)-U|.q)|  ^  CK' 


Lemma  8.3:      If   the  support     Q     of     q .      is  a  box,      1=1      or  d  =  0,      there  is 

C     such   that     ?  =  {f(q):    each  additive  component      f(q)     of     f(q)      is 

continuously  different iable  of  order     ^     and     max    \d  f(q)\    ^  C),      and     m   a 

^-1,      then  for  splines   there   is     C  >  0     such   that   for  all      f  e  ?, 

inf     ^K\f(q)-p'^(q)'Ti\,        <  CK'"-,      n.  =   1.      ^  <  d.      and     a  =   ((/ri)-d. 
TteiK  a,  00 


Proof:   The  result  for  d  =  0  follows  by  Theorem  12.8  of  Schumaker  (1981). 
For  the  other  case,  w. l.g.  assume  r  =  1  and  let  Q  =  [-1,1].  Note  that 
d   p  (q)/aq  is  a  spanning  vector  for  splines  of  degree  m-d,   with  knot 
spacing  bounded  by  CK    for  K   large  enough  and  some  C.   Therefore,  by 
Powell  (1981),  there  exists  n       such  that  for  f„(q)  =  p  (q)'Tr  , 
sup  I  a  f(q)/aq  -  a  f  (q)/aq  I  i  C'K    .   The  conclusion  then  follows  by 
integration.    ■ 


Lemma  8.4:      If  Assumption  6.1    is  satisfied,    then  for  power  series  Assumption 
3.4   is  satisfied  and  for  any     d   >  0      there   is  a  constant      C     such   that   for  all 


Proof:   First,  assume  that  q   is  nonexistent,  and  let  q  =  q  .   Following 

(a) 
the  definitions  in  Abramowitz  and  Stegun  (1972,  Ch.  22),  let  C    ix)      denote 

..(a) 
the  ultraspherical  polynomial  of  order  k  for  exponent  a,   n    = 

Tr2^~^°'r(k+2a)/{k!  {k+a)[r(a)]^},   and  p^^Ux)   =    [h'-'^h'^'^^C^'^Ux).      Also,  let 

12    2   1 
x.{q.)    =    (2q  .-q  .-q  .)/(q  .-q  .)   and  define 
J   J       J   J   J    J   J 


(i^  +.5) 

P.  (q)  =  n.^.p  .\,  ,  (a:.(q.)), 
k  ^     j=r  A  (k)   J  ^j 


K  K 

P  (q)   is  a  nonsingular  combination  of  p  (q)   by  the  "natural  ordering" 
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assumption  (i.e.  by   |X(k)|   monotonic  increasing).   Also,  for  P(q) 

r   12 
absolutely  continuous  on  Q  =  J]  ._^[ci  .,q  .]     with  pdf  proportional  to 

V .  J    J   J 

n-=i  f  ^^5 -~*5 -^  ("^ -"^ -^  ^   >  ^^^  ^y   t^®  change  of  there  is  a  constant  C  wi 


th 


A^^^(jP^(q)P^(q)'dP(5)) 

(v  +.5)         (i^  +.5) 
^^in^J'Vlf^   'm    ^^J^^'j^^^  'm    («^j(qj))']dP(q))  =C. 

K  r  J^'i^-^^ 

where  the  inequality  follows  by  P  (q)   a  subvector  of  ®.  ,  [q   .,   (a;.(q.)) 

J-1     M     J   J 

for  M  =  maXj^^j,|A(k)|   and  P^^^o:)  =  ip'^'^Ux) p^^^a:)). 

Next,  by  differentiating  22.5.37  of  Abramowitz  and  Stegun  (for  m  there 

i   (y+  5)      £ 
equal  to  v     here)  and  solving,  it  follows  that  for  I  £  k,      d  C  ,  '   (a:)/diC 

=  C«C  ,_„'       (x)      so  that  by  22.14.2  of  Abramowitz  and  Stegun,  for  A(k-s)  as 

in  equation  (2.3), 

|aV^(q)|  s   Cn.^,[l.A,(k-s)]"''''j'^^J  s   C|A(k-s)r^-^^^"^^  ^  CK' ^"■^^^X 


where  the  last  equality  follows  by   |A(k-s)|  s  CK 


1/a 


Now,  for  the  case  with  q  ,   let  P   (q)  =  q.,,  k  =  1 s,   P   (q)  = 

1  KJ\.         IK  Kl^ 

P    (q)   for  P.  (q)   described  above.   The  bounds  of  the  previous  equation 

continue  to  hold  by  q   bounded.   Let  P(q„)  =  (P  (q  ),...,  Pj,(q„) )'   and 


n 


2K 


the  linear  space  spanned  P(q  ).   Note  H  ^   £  J^^,  so  that  K, 


E[{q^-P{q^\n^^)}{q^-P(.q^\n^^)}']      is  bigger  than  M.   and  thus  has  smallest 
eigenvalue  bounded  away  from  zero  by  Assumption  6.1.   Furthermore, 
E[P^(q)P^(q)']  =  BDB'   for, 


B  = 


I   E[q^P(q2)'](E[P(q2)P(q2)']) 


-U 


\ 


0  E[P(q2)P(q2)'] 


Thus,  by  the  extremal  characterization  of  the  smallest  eigenvalue. 
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E[P^(q)P^(q)'])  i  A^.  (BB')A^.  (D)  >  min{X  .  (M  ).A  .  (E[P(q- )P(q^) ' ] ) >  i  C.  ■ 

min      mm  min  K.   min      2    2 

Lemma  8.5:      If  Assumption  7.1    is  satisfied   then  for  splines  Assumption  3.4   is 

K  K 

satisfied,    the  number  of  nonzero  elements  of     P  (q)P   (q)'      is  bounded  by     CK, 

and  for  any     d   >  0      there   is  a  constant     C     such   that   for  all      X     with      \X\    £ 

d     and     X  .  <  m.      (J=l,...,r),      it   follows   that      sup  ,^„|9  P,^(q)\    s 

J  q^Q, K— A    XA 


Proof:   First,  consider  the  case  where  q  =  q   and  let  Q  =  ffn_,  [-1,1].   Let 

Let  B ..  (a;),   be  the  B-spline  of  order  m,   for  the  knot  sequence 

-1  +  2j/[L+l],   j  =  ...,  -1,  0,  -1,  ...   with  left  end-knot   j,   and  let 


Pl^(q)  =n;iHA^(k)>0)P^_^^(^^(q^). 


K        K 
Then  existence  of  a  nonsingular  matrix  A  such  that  P  (q)  =  Ap  (q)   for  q  e 

Q  follows  by  inclusion  in  p  (q)   of  all  multiplicative  interactions  of 

splines  for  components  of  q  corresponding  to  components  of  h(q)   and  the 

usual  basis  result  for  B-splines  (e.g.  Theorem  19.2  of  Powell,  1981). 

Next,  by  a  well  known  property  of  B-splines,  ^px^^'-ip^^pic'  ^^p^   ~  ^     ^°^ 

all  q  €  R  if   |k-k'  I  >  m,   implying  that  the  number  of  nonzero  elements  of 

P*^(q)P^(q)'   is  bounded  above  by  2(m+l)'"K  =  CK.   Also,  for  P(q)   the 

distribution  of  r  independent  uniform  random  variables  on  Q,  noting  that 

-1/2 
[2(m+l)/L.]  [L„/2]    ^pv^'^p'^      ^""^  ^^^  so-called  normalized  B-splines  with 

evenly  spaced  knots,  it  follows  by  the  argument  of  Burman  and  Chen  (1989,  p. 

1587)  that  for  P^  l^^^  ^  ^^ll^^l^ ^i  L+m+l'^^l^^ '      ^^^^^   ^^     ^     "^^^^ 

X    .    {S     Po  , (q)Po  T  (q)'dq)  2  C  for  all  positive  integers  L.   Therefore,  the 
mm     c,  L    c,  L 

boundedness  away  from  zero  of  the  smallest  eigenvalue  follows  by  P  (q)  a 
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subvector  of  ®f.Pp   ,  (qn),  analogously  to  the  proof  of  Lemma  8.4.   Also, 

since  changing  even  knot  spacing  is  equivalent  to  rescaling  the  argument  of 

B-splines,   sup„|a  B  .,  {x)/d<c    |  s  CL  ,   d  s  m,   implying  the  bounds  on 
IK     JL 

derivatives  given  in  the  conclusion.   The  proof  when  q   is  present  follows 
as  in  the  proof  of  Lemma  8.4  ■ 

Proof  of  Theorem  4.1:   Note  that  Assumptions  3.2  and  3.3  imply  that  Assumption 
9. 1  is  satisfied  for  J  =  1  and  y  =  y.   The  first  conclusion  then  follows 
by  Lemma  9.9,  and  the  second  by  Lemma  9.10.    ■ 


Proof  of  Theorem  4. 2:   By  reasoning  as  in  the  previous  proof,  the  theorem 
immediately  from  Lemma  9.11.    ■ 

Proof  of  Theorem  4.3:   Follows  immediately  from  Theorem  4.2.    ■ 

{J 

Proof  of  Theorem  5.1:   By  Assumption  3.4,   P  (q)   Is  a  nonsingular  linear 

K  K         K 

combination  of  p  (q),   so  that  replacing  p  (q)  by  P  (q)  does  not  change 

A(h),   Z,  n,      or  Q.      Thus,  it  suffices  to  show  the  conclusion  with  this 

2    1/?  2  1/2 

replacement.   Note  that   IIDBll  =  [trCDB  D')]    :£  IIDIIX    (B  )    =  IIDIIA    (B) 

max  max 

-1/2 
for  any  matrix  D  and  positive  semi-definite  B.   Thus,  for  F  =  n    .by 

2  -1/2 

0"  (q)  i  C,   for  a  positive  definite  square-root  Z    , 

(8.1)     IIFA'Z"-^''^!!  =  {tr[FA'Z~-^AF']>^''^  s  C{tr  [FA'Z"-^VS"-^AF' ]  >^'^^  ^   Cv^, 

—  1  —1 /9  —1  /? 

IIFA'Z      II    s   UFA'S  \\X        (Z  )    £  CVn. 

max 

—  1  K  K 

Note  that  A'Z  A  is  invariant  to  replacing  P  (q)  by  the  C  P  (q)   from 

Assumption  5.2,  so  that  in  analyzing  the  properties  of  A'Z  A  the  j 

element  of  P  (q),   and  hence  the  j    row  of  A,   is  Invariant  to  K.   It 

-1 
then  follows  that  A'Z  A   is  a  monotonic  Increasing  sequence  in  the  positive 

semi-definite  semi-order  (since  this  matrix  is  formally  identical  to  the 
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inverse  asymptotic  variance  of  a  minimum  chi-square  estimator,  which  rises  as 
additional  equality  restrictions  are  added).   Therefore,  the  smallest 
eigenvalue  of  A'Z  A   is  also  monotonic  increasing  in  K,   so  that  by  A 
having  full  rank  for  some  K,   the  smallest  eigenvalue  of  A'Z  A 
is  bounded  away  from  zero.   This  implies  that 

(8.2)  IIFII  5  CA    in~'^)^^^   i  CA  .  (Q)'^""^  ^  CVn. 

max  min 

Also,  t~^-I.~^    =   Z~^(E-S)Z~-^  =  E~^(Z-Z)Z~-^+  (Z~-^-Z~^  )  (Z-Z)Z~^  =  z"^(Z-Z)Z~^  + 

Z~  (Z-Z)Z~'^(Z-Z)Z~  ,   by  Lemma  9.6  and  Assumption  3.7,   IIZ-ZII  =  o  (1), 

P 

~-l  -1/2 

A    (Z   )  =  0  (1),  and  there  are  positive  semidefinite  square  roots  Z    , 
max         p 

Z"^'^^   such  that  A    (.iT^^^)    =   0(1)   and  A    (t~^^^)    =  0  (1).   It  follows 
max  max  p 

-1        -1/2 
that   IIAZ   II  i  CIIAZ    II,   so  that  w.p.a.l, 

--!/?    2  -1/7 

IIFA'Z  II      ^  Cn[l    +  A        (Z  )IIZ-ZII(1    +    IIZ-SIIO    (1))]    =  0    (n), 

max  p  p 

--1   2  --1/2  2  --1/2  2 

IIFA'Z      II      <    IIFA'Z    '^irA         (Z  )      =  0    (n), 

max  p 

Let  TT  be  such  that   |h(q)  -  P^(q)'ji|     ^  CK~"-   For  h  = 

Q  ,  V 

(h(q^),....h(q^))'. 

F[A(h)  -  A(hQ)]  =  FA'Z~V'u/n  +  R,   R  =  F[A' t'"^?'  {h-Pn)/n+A' n-Aih^)] , 

where,  for  convenience,  the  K  superscript  on  P   has  been  dropped.   By  vi) 
and  (8.2),  w. p. a. 1, 

(8.3)  IIRII   ^   IIFA'Z~'^P/V^llllh-PSll/v/n  +    IIFIIIIA'Tt  -  A(hQ)ll 

£  C[tr(FA'z"4z~-^AF')]^^^  +  ^Ik""  s  C^nK""  =  o    (1). 

Also,  FA'Z~-^P'u/n  =  FA'Z~^P'u/n  +  R.   R  =  FA' (Z~^-Z~-^  )P'u/n.   By  the  proof  of 

Lemma  9.8,   liz"-^^^' u/V^II  =  0  {y}''^)      and  by  Lemma  9.6.   II(Z-Z)Z~    II  £ 

P 


37 


-  —1  /?  1  /?  "? 

IIZ-ZIIA        (S  )    =  0    (K       Cn(K)   /VK).      Therefore, 

max  p  0 

(8.4)  IIRII   =   IIFA'i~-^(2-Z)Z~-^P'u/nll   :£   IIFA'Z'VyHlIll  (E-Z)S~^'^^II  IIZ~-^''^P'u/V^II 

s  CO    (K^'^^K^^^<„(K)^/v^)   =  o    (1). 
P  0  p 


Next,  let  I'  be  a  constant  vector  with  llvll  =  1  and  the  same  dimension 

as  A(h),   and  Z.   =  p'FA'Z~^P^(q, )u,/v^,   so  that  r",Z.  /Vn  = 
in  ^11  ^i=l  in 

v'FA'Z  P'u/n.   Note  that  Z.    is  a  martingale  difference  sequence,  and 

9  _1     If  1  /o 

E[Z^^]    =   1.      Also,     IZ^^I    £   II/JIIIIFA'Z      IIIIP    (qJlllu^l/'/S  £  CK        Cq(K)|u^|.      By 

2 
Assumption  5.1,  for  a  =  s/2,   u[l-(2/a)]  >  1   and   |u.  I   <  co.   Thus, 

1  0. 

(IZ^^I^)^  -  CK^Co^^^'^^'^i'^'^  "  ^^<o^^^'^-   ^^^"^  '^^  Davydov's  (1968) 

2 

inequality  and  K  s  KK, 

E[(j:.Z^./n  -  1)^]  s  i  (C/n)[j:^"^r*'^^  "  ^''^^KIZ^  I  )^  ^  CK^C-(K)Vn  =  o(l), 
'^i  ni  ^t=0  in  ci  0 

so  that  r.Z  ./n  -^  1.   Also,  for  any  e  >  0, 
^1  ni  .       J-       . 

E[1(Z^  >  e-n)Z^  ]  £  E[Z^  ]/(n6)  s  CkV  (K)\[  |u.  |'^]/(ne)  =  o(l). 
m        in       m  0        i 

It  then  follows  by  Theorem  5.2.3  of  White  (1984)  that  Y.^ J..    /Vn   -^  N(0,  1). 

*"i  =  l  in 

Since  this  result  holds  for  all   i^  with   llvll  =  1,   FA'Z'^^P'u/n  -^  N(0,I) 

follows  by  the  Cramer-Wold  device.   The  first  conclusion  then  follows  by  eqs. 

(8.3),  (8.4),  and  the  triangle  inequality. 

To  prove  the  second  conclusion,  note  that  Assumption  5. 1  implies  that  the 

hypotheses  of  Theorem  4.1  are  satisfied.   Thus,  for  u.  =  y  -  h(q  ), 

Y.  |u.-u. l^/n  =  L  |h(q.)-h(q.)|^/n  =  0  (K/n  +  (nK"^")/n)  =  0  (K/n)  =  o  (1). 
^1   1   1       ^i    1     1        p  p        p 

Therefore,    by      |u.-u. |    £  2|u. I |u. -u. I    +    |u.-u.|    , 
■'ii  111  11 

/^7      ?  »  ?         1 /?  1/2      1/2 

(8.5)  j:^|u^-u^|/n  :£  0   (1){J:^|u.-u.  r/n}^'^^  =  0    {Y^'^/n^). 
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Let  V  and  Z  be  as  defined  above,  except  with  P  (q)   replacing  p  (q), 

and  let  V  =   T.P^Cq. )P^(q. )'u?/n.   Then 
''I    111 

(8.6)      IIV-VII  s  K^'^^^QCfO^EilV^i'^"  "  °  (K^^^Co(*^)^K^^^/n^^^)  =  o  (1). 

2 

Next,  apply  Lemma  9.6,  with  y  =  P   (q.  )P^(q  )u  ,   noting  that  by 

s  >  i^l/{^l-l),    Assumption  9.1  is  satisfied  for  J  =  K,  v    U)    =   C„(K)  ,   B  , 

y       0       yi 

2  _  ~  1  /?    _  7 

u7.   and  s   there  equal  to  2s,   so  that   IIV-VII  =  0  (K       C, AK)   /Vn)    =  o  (i: 

It  now  follows  by  the  triangle  inequality  that   IIV-VII  =  o  (1).   Therefore, 

IIF[n-n]F'll    <   II FA 'Z"-^  II ^IIV-VII    +    IIF[A'f:~'^(Z-S)Z~-^VZ~^(Z-f:)z"-^A]F'll 

+   2IIF[A'Z~-^(Z-Z)Z~'^VZ~^A]F'II 

s  o    (1)    +   o    (1)X         (Z~-^)    +   o    (DA         (1)    =   o    (1), 
p  p  max  p  max  p 


giving  the  second  conclusion.   If  the  final  hypothesis  is  satisfied,  then 

,l/2-,l/2    _,l/2  ,      ,.   .,    ^  ,v,  ,    ,   ,,  ^,1/2    ,l/2_,l/2 

w       Q         — >  i2„    by  continuity  of  the  square  root  and   (i//  Q)    =  i/»   Q 
n  0  n         n 

1  yy    1  yy 
Therefore,  multiplying  through  by  0   Q        ,   it  follows  that  i//  Q   — >  Q  , 

^   K   o       -1     .K  V   /-1/2--1/2  p   -1/2   _,    ^      ^-1/2-1/2 
and,  by  Q   nonsingular,  that  i//    Q     — ^  Q    .   Therefore,  U         Q 

-^   I,   so  the  final  conclusion  follows  from  the  first  conclusion. 

Proof  of  Theorem  5.2:   By  iii)  and  each  component  of  p  (q)  an  element  of  H, 

A  =  E[p*^(q)5(q)'].   Let  5^(q)  =  p^(q)'Z"-^A  =  p^(q) '  z"-^E[p^(q)5  (q) '  ] .   Since 

5  (q)   is  the  minimiim  mean-square  error  linear  combination  of  p  (q),   it 

follows  by  ii)  and  (r^(q)   bounded  that  E[ll5(q)-5  (q)ll^]  -^  0  and 

E[a-^(q)ll5(q)-5^(q)ll^]  s  CE[  115 (q)-5„ (q)  11^]  -^   0.   Therefore,   A'Z'-^A  = 

E[5  (q)5  (q)']  -^   E[5 (q)5(q) '  ] ,   so  that  Assumption  5.2  is  satisfied  by  iii). 

Also, 
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llnn-nQll  =  IIE[<r^(q)A'Z  ^p^(q)p^(q)'Z"-^A]  -  n^ll 

=  IIE[(r^(q)5j,(q)5j,(q)']  -  E[o-^(q)5(q)5(q) '  ]  II  ^  E[o-^(q)  Il5(q)-5j,(q)  11^]  -^  0, 

so  that  the  final  hypothesis  of  Theorem  5. 1  is  satisfied.   The  conclusion 
follows  by  the  final  conclusion  of  Theorem  5.1.    ■ 


Proof  of  Theorem  5.3:   If  A(h)   is  not  mean  square  contionuous  then  there 

2 

exists  a  sequence  h.(q)  e  H,      (J  =  1,  2,  ...)   such  that  E[h.(q)  ]  — »  0  and 

|A(h.)|   is  bounded  away  from  zero.   Consider  any  parametric  submodel  such 
that  P(.y\H)   =   h(q)  +  rh.Cq),   with  true  value  of  y     equal  to  zero.   By 
Chamberlain  (1987)  the  supremum  over  all  such  submodels  of  the  Cramer-Rao 

variance  bound  is  the  the  asymptotic  variance  of  the  least  squares  estimator, 

2-2   2       2 
which  is   (E[h.(q)  ])  E[(r  (q)h.(q)  ].   Furthermore,  by  the  delta-method  and 

2 

(T  (q)  bounded  away  from  zero,  the  corresponding  supremum  for  A(h)   is 

[aA(h+rh.)/ar]^(E[h.(q)^])"^E[(r^(q)h.(q)^]  i  CA(h  .  )^(E[h  .  (q)^]  )~^  -^  co. 
J  J  J  J      J 


Therefore,  the  supremum  over  all  parametric  submodels  of  Cramer-Rao  bounds 
for  A(h)   is  not  finite.    ■ 

Proof  of  Theorem  5.4:   Given  in  Section  5.   ■ 
Proof  of  Theorem  5.5:   Given  in  Section  5.   ■ 

Proof  of  Theorem  6.1:   Proceed  by  verifying  the  hypotheses  of  Theorems  4.1 
and  4.2.   Assumption  3.4  with  C^fK^  ~   ^      follows  by  Assumption  6.1  and 
Lemma  8.4.   Assvunptions  3.5  and  3.6  follow  by  the  assumption   |A(k-s)|   is 
increasing,  which  implies  that  the  products  of  univariate  orthogonal 

polynomial  terms  form  a  nested  sequence.   Assumption  3.7  follows  by  Assumption 

2 

6.2  and  K  =  K  .   Assumption  3.8  with  d  =  0  follows  by  Lemma  8.1.   The  first 

two  conclusions  then  follow  from  Theorem  4.1.   The  third  line  follows  from 
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Theorem  4.2  similarly.   The  final  conclusion  follows  from  Theorem  4.2,  the 

bound  on  C-v  (J^)   from  Lemma  8.4,  and  Lemma  8.2  (which  implies  Assumption  8  for 
A 

any  a  >  0 ) .    ■ 


Proof  of  Theorem  6.2:   By  Theorem  5.1  it  suffices  to  show  that  Assumption  5.3 

is  satisfied.   Assumption  3.4  with  Co(K)  =  K'  follows  by  Assumption  6.1 

and  Lemma  8.4.   Assumptions  3.8  and  3.9  and  VnK  — >  0  follow  by  ii)  and 

Lemmas  8.1  and  8.2.   Finally,  note  that  K  =  K^,  so  that  KKC,    {K)^/n  = 

j,3j,2+4i^,    i,5+4i7,     „  ,   .  , 

K  K    /n  =  K    /n  — >  0  by  1 ) .    ■ 


Proof  of  Theorem  6.3:   Proceed  to  verify  hypotheses  of  Theorem  6.3. 

Assumptions  3.1,  3.2,  and  5.1  are  satisfied  by  the  independence  of  the 

observations.   Note  that  A(h)  =  h  .  (q  .  )-h  .  (q  .)  =  h(q, q q  )  -  h(q), 

J   J   J   J       1      J      r 

is  continuous  with  respect  to   |h|„   ,   while  A(h)  =  d   h.(qj/5q.   = 

0,00  J   J      J 

3  h(q)/3q.    is  continuous  with  respect  to   |h|  ,   ,   so  that  ii)  of  Theorem 
J  '^  d.oo 

-1/2 
6.3  is  satisfied.   The  conclusion  then  follows  by  taking  ip     =  Q         .        u 

Proof  of  Theorem  6.4:   Proceed  from  Theorem  5.2.   It  follows  by  A(h)  = 
E[5(q)h(q)],   as  in  the  proof  of  Theorem  6.3,  that  Assumption  5.3  is  satisfied 
with  V  =  2.   Theorem  5.2  ii)  follows  by  the  well  known  spanning  result  for 
power  series  for  bounded  q   (e.g.  Gallant,  1980),  giving  the  result.    ■ 

Proof  of  Theorem  6.5:   Follows  from  Theorem  5.3  by  the  same  argument  used  in 
the  proof  of  Theorem  6. 5.    ■ 

Proof  of  Theorem  7.1:   Proceed  by  verifying  the  hypotheses  of  Theorems  4.1  and 
4.2.   Assumption  3.4  with  Cp,(K)  =  K'    follows  by  Assumption  7.1  and  Lemma 

8.5.   Assumptions  3.5  and  3.6  follow  trivially  by  K  constant.   Assumption 

2 
3.7  follows  by  Assumption  7.2  and  Lemma  8.5,  which  implies  K  =  ^  CK  . 

Assumption  3.8  with  follows  by  Lemma  8.3.   The  conclusions  then  follow  from 
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Theorems  4.1  and  4.2,  with  the  bound  on  <..(K)   from  Lemma  8.5.    ■ 

A 

Proof  of  Theorem  7.2:   By  Theorem  5.1  it  suffices  to  show  that  Assumption  5.3 
is  satisfied.   Assumption  3.4  with  Cr,^^^  ~  ^'    follows  by  Assumption  7.1 
and  Lemma  8.5.   Assumptions  3.8  and  3.9  and  VuK.       — >  0  follow  by  ii)  and 
Lemma  8.3.   Finally,  note  that  K  ^  CK  by  Lemma  8.5  so  that  KK<  (K)  /n  = 
K^K^/n  =  K^n  ^0  by  i).    ■ 


Proof  of  Theorem  7.3:   Follows  analogously  to  the  proof  of  Theorem  6.3. 


Proof  of  Theorem  7.4:   Follows  analogously  to  the  proof  of  Theorem  6.5, 
except  for  the  explicit  formula  for  5(q)  given,  which  follows  by 
J'w(q.)[a^h.(q.)/aq'!]dq.  =  Jw(q)[5'^h(q)/aq'^]dq  for  w(q)  =  w(q  )f  (q_  . ) ,   where 
[d^wiq)/aq^]/f(.q)    =    [a'^wCq  .  j/Sq'!]  f  (q_  .  )/f  (q)  =  [a^w(q  .  j/aq'^J/f  (q  .  |  q_  . ) . 


9.    Useful  Lemmas 

This  Section  gives  general  results  on  convergence  rates  for  certain 
remainder  terms.   It  is  useful  to  allow  throughout  for  a  vector  of  series 
estimates  with  dimension  that  can  increase  with  sample  size.   To  do  so,  it  is 
necessary  to  introduce  more  notation  and  assumptions. 

I     m 

Let   {y.,)._,  j_i   be  a  collections  of  functions  of  a  single  data 

observation  z.   For  notational  convenience,  the  J  subscript  on  y.   will 

be  suppressed  in  what  follows.   The  results  will  pertain  to  certain  of  sample 

averages  of  these  functions,  or  of  series  estimators  of  the  projections  h.(q) 

of  y.  on  K.   Denote  the  observations  on  y.  by  y.  .,   and  let  u.  .  s 

y.  .-h.(q.)   and  objects  without  a  subscript  denote  corresponding  vectors  of  n 
ij   J   1 
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or  J  observations,  e.g.   y.  =  (y  ,y  .)'   and  u.  =  (u.  u.  )'.   For 

K  =  K(z  , .  .  . , z  ,n),   let  p  =  [p  (q.) p  (q  )]'.   The  estimators  are  h.(q) 

=  p^(q)'(p'p)"p'yj. 

Assumption  9.  1:   For  s  >  1   and  i'  ( J) ,   max .  ,|u..|  ^  v    (J)B  .,   IB  .1   <oo, 

y       j^J  ij    y    yi    yi  s 
2 
E[B  .Iq.]  i  C  and  either  a)   z   is  uniform  (0)  mixing  with  mixing 

coefficients  <pU)   =  0(.t~^),      (t  =  1,  2,  .  .  .  ) ,  ^  >  2,   or  b)  there  exists 

rit)      such  that  Z+=i<^(*)  "^  "  2i^d  max.   |E[u.  -u.  .  -Iq-.q.^^ll  -  cU)v   (J)  , 

(i=1.2....  ). 

Henceforth,  let  ^^^i  =  ^i'   I^£=i  =  T.(^'      ^^^     Y.i=i   =  ly 


The  first  few  Lemmas  consist  of  useful  convergence  results  for  random 

matrices  with  dimension  that  can  depend  on  sample  size.   Let  Z  and  Z 

denote  symmetric  matrices  such  matrices,  and  A    (•)   and  X    .    (•)   the 

max  mm 

smallest  and  largest  eigenvalues  respectively. 


Lemma  9.1:      If     X        (Z)  i  C  with  probability  approaching  one   (w. p.  a.  1)   and 

IIZ-SII  =  o  (1)  then     X    .    it)    ^  C     w.p.a.l. 
p  mm 


Proof:   For  a  conformable  vector  ^l,      it  follows  by  II  •  II   a  matrix  norm  that 

X    .    (Z)    =  min„    „    A^l'I■^l  +  fi' (Z-Z)fi>   =:  A    .    (Z)    -  A        (Z-Z)    ^  X    .    (Z)    -    IIZ-ZII   £ 
mm  ll/ili=l   >-     -^       ^  ^  mm  max  mm 

C  -  o    (1).      Therefore,      A   .    (Z)   ^  C/2     w.p.a.l.        ■ 
p  mm 

Lemma  9.2:      If     A    ,    (Z)    ^  C     w.p.a.l,       IIZ-ZII    =  o    (1),      and     D        is  a 
min  ^  p  n 

-1/2 
conformable  matrix  such   that      IIZ         D   II   =  0    (€   )      for  some     e   ,      then 

n  p     n  n 

IIZ"-^''^D   II   =  0    (e   ). 
n  p     n 

Proof:       It    is  easy  to   show   that   for  any  conformable  matrices     A     and     B,       IIABII 

<    llAlhllBII,       IIA'BAII   i    IIBIIollA'AII,      and   that    if     B      is  positive  semi-definite, 

tr(A'BA)    :£    IIAII^A        (B),       IIABII   ^   IIAIIA        (B)      and      IIBAII    ^   NANA        (B).      Let 
max  max  max 
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-1/2  -1 

Z      be  the  symmetric  square  root  of  Z    which  is  equal  to  UAU'   where  U 

is  an  orthogonal  matrix  and  A  a  diagonal  matrix  consisting  of  the  square 

-1  -1/2 

roots  of  the  eigenvalues  of  Z  .   Note  that  Z      is  positive  definite  and 

-1/7  -1   1/?  --1 

X   (Z    )  =   IX        (Z  )]    .   Also  by  Lemma  9.1,  X        (Z  )  =  0  (1).   Then 
max  max  ^  max        p 

(9.1)     IIZ'-^'^^D  11^  =  tr(D' [z"-^-z'-^]D  ) 
n        n         n 

s  liZ'-^'^^D  11^(1  +  I1Z"'^''^[Z-Z]Z"-^''^II)  +  ll(Z-Z)Z"-^D  II^A    (Z~-^)] 
n  n   max 

£   0  (e^)[l  +  o  (1)0  (1)  +  IIZ-ZII^A   (Z'-^'^^J^O  (1)]  =  0  (e^).    ■ 
p  n       p    p  max        p       P  n 

Let  tr(A)  denote  the  trace  of  a  square  matrix  A  and  u  a  random  matrix 
with  n  rows. 

Lemma  9.3:      Suppose     X    .    (Z)  ^  C  w. p. a.  1. ,      P  is  a     K  x  n  random  matrix 

mm 

-1/2—  — 

such   that      llP'P/n  -  Zll   =  o    (1)      and      IIZ  P'u/nll   =  0    (e    ),      and     p  =  PA 

P  p  n  '       '^ 

2 
where     A  is  a  random  matrix.      Then     tr(u'p(p'p)  p'u/n)  =  0  (e  ). 

p  n 


Proof:   Let  W  =  P(P'P)  P'   and  W  =  p(p'p)  p'   be  the  orthogonal  projection 

operators  for  the  linear  spaces  spanned  by  the  columns  of  P  and  p 

respectively.   Since  the  space  spanned  by  p  is  a  subset  of  the  space  spanned 

by  P,   W-W  is  positive  semi-definite.   Let  Z  =  P'P/n.   Then  by  Lemma  9.2, 

tr(u'Wu/n)  s   tr(u'Wu/n)  =  IIZ~^'^^P'u/nll^  =  0  (e^).    ■ 

P  n 

Let  Y  and  G  denote  random  matrices  with  the  same  number  of  columns  and  n 

rows,  and  let  u  =  Y-G.   For  a  vector  p  let  n  =   (p'p)  p'Y  and  G  =  prr. 

-  2 

Lemma  9.4:      If     tr(u'p(p'p)  p'u/n)  =  0  (e  ).  Then  for  any  conformable  matrix 

n,      lIG-Gll^/n  £  0  (e^)  +  IIG-pnll^/n. 
p  n 

Proof:   For  W  and  W  as  in  the  proof  of  Lemma  9.3,  by  Wp  =  p,   and   I-W 
idempotent, 
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IIG-GII^/n  =   tr[Y'WY  -  Y'WG  -  G'WY  +  G'G]/n  =   tr[u'Wu  +  G' (I-W)G]/n 

s  tr[u'Wu  +    (G-ptt)' (I-W)(G-p7r)]/n  s  0    (s^)    +    llG-p7rll^/n.         ■ 

p     n  ^ 

Lemma  9.5:      Suppose     X   .    (S)  ^  C  w.p.a.l.,      P  is  a  K  x  n  random  matrix 

such   that      llP'P/n  -  Ell  =  o  (1)  and     tr[u'P(P'P)~P' u/n]  =  0  (e^),  and     P  =  PS 

P  P  n  ' 

where     S  is  a  random  selection  matrix.      Then  for  any  conformable  matrix     n, 

\\n-n\\^   so  (e^)  +0  (1 )  IIG-p7rll^/n, 
p  n     p 

tr[(7r-Tt)'S'ES(rt-7r)]  ^   0  (e^)  +  0  (1 )  IIG-p7rll^/n. 

p  n     p 


Proof:   By  p'p/n  =  S'(P'P/n)S  for  the  selection  matrix  S,   A  .  (p'p/n)  i 

min 

A  .  (P'P/n).   Thus,   A  .  (p'p/n)  i  C  w.p.a.l,  so  A  .  (p'p/n)~-^  =  0  (1). 
min  min  f  >  ^^^   r-  ^'         p 

Also,  note  that  for  W  as  above,   W  =  p(p'p)  p' ,   and  G  =  pTr, 

llTT-Till^  £  A    .    (p'p/n)~-^tr[(7r-Tr)' (p'p/n)  (n-Tt)] 
min  ^  ^ 

=  0    (l)tr[Y'WY  -  Y'WG  -  G'WY  +  G'G]/n 
P 

£  0  (l)[tr(u'Wu/n)  +  IIG-GII^/n]  =  0  (e^)  +  0  (1 )  IIG-GII^/n. 
P  P  n     p 

To  prove  the  second  conclusion,  note  that  by  the  triangle  inequality  and  the 
same  arguments  as  for  the  previous  equation, 

tr[(7i-7r)'S'ZS(7i-7i)]  =  tr[(n-rr)' [S'ZS-p'p/n]  (tt-tt)]  +  (tt-tt)' (p'p/n)  (tt-tt) 

s  llTi-Till^llS'ZS-p'p/nll  +  0  (e^)  +  0  (l)IIG-GII^/n 

p  n     p 

£    [0  (6^)  +  0  (l)IIG-GII^/n](l  +  IIZ  -  P'P/nll)  =  0  (e^)  +  0  ( 1 )  IIG-GII^/n. 
p  n     p  p  n     p 


The  next  results  give  convergence  rates  for  sample  average  with  dimension 
that  can  grow  with  sample  size.   For  ji     in  Assumption  1  and  s   in  Lemma  9.6 
below,  let  a  >  2/i/((j-l)   be  as  small  as  desired,  and 
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^  f-(l/2)  +  (l/s)-(l/A),   s  :£  2fi/(/J-l)   ^  =  fl 
I       -1/2        ,  s  >  2u/(u-l)      I 


-  (s/2a),  s  s  2m/(h-1) 
,  s  >  2(j/(/J-1)      ^    1/2   ,  s  >  2/i/(ji-l) 


Lemma  9.6:      If  Assumption  3.1    is  satisfied  and  there  exists  increasing     v  (J) 

and     B    .     with     max  .^,\y .  .\    ^  v  (J)B    ..      (J  =  1,    2,    ...),      and      \B    .\      <  m 
yi  j^    ij         y       yi  yi  s 

for     s  >   1,      then 

llE^y^/n  -  E[y^]ll  =  0  (A  (J)n"). 


Proof:   The  proof  for  the  case  s  >  2|Li/(fx-l)   follows  immediately  from 

2 

applying  Davydov's  inequality  to  the  covariance  terms  in  E[  IIJ].y./n-E[y.  ]  II  ]. 

The  proof  for  the  other  case  follows  by  a  truncation  argument  analogous  to 
that  used  to  prove  weak  laws  of  large  numbers.    ■ 


K  —   —  — 

For  the  P  (q)   of  Assumption  3.4  and  K  =  K(n)   of  Assumption  3.5,  let  P(q) 

=  P^(q),   Z  =  E[P(q.  )P(q.  )' ].   S  =  T '^.PCq.  )P(q.  )'/n.   and  K  denote  the 

11  ^t=l   ^1   ^1 

number  of  elements  of  P(q)P(q)'   that  are  nonzero  at  any  point  in  Q. 


Lemma  9.7:      If  Assumptions  3.1,    3.4,    and  3.7  are  satisfied   then 

lli  -  Zll  =  0  (K<-(K)Vn). 
p  ^0 

Proof:   Apply  Lemma  9.6  for  y.  =  P,  rr(q)P/,7(q)  for  all  k  and  I     such  that 

P^(q)P^(q)   is  nonzero  for  some  q  e  Q.   Here,  J  =  K.   Note  that  for  all  q 

_  2 
e  Q,   IP,  i^Cq.  )P»v^(q.  )  I  =s  C<  (K)  ,   so  that  Assumption  9.1  is  satisfied  for 
kK  i  cK.  i       0 

_  2 

V   (J)  =  CrtfK)  ,   B  .  =  1,   and  s  =  m.   Thus,  the  conclusion  follows  by  Lemma 

9.6  with  s  =  >  2fi/(ji-l).  ■ 

Let  y,   h,   and  h  be  n  x  J  matrices  with  respective  ij    elements  y.  ., 

h .(q. ),   and  h  .(q. ). 
J   1  J   1 
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Lemma  9.8:      If  Assumptions  3.1,    3.4,    3.5,    3.7,    and  9.1   are  satisfied,    then 

(9.5)  tr[(y-h)'p(p'p)~p' (y-h)/n]  =0  (JKi^  (J)^/n). 

Proof:   Let  u  =  y-h,   P.  =P^(q.),   Z  =  E[P.PM,   and  C . .  s 

1       1  11  jt 

E[P  P.^'u  u   .   ],   (0  s  t  <  n-i).   By  Assumption  9.1  and  iterated 
expectations,   EEP^^P'^^u^ .]  =  E[Pj^P^E[u? .  |q  ]  ]  ^v    (J)^Z.   Also,   E[P  u   ]  =0 
by  each  element  of  P   an  element  of  H     and  orthogonality  of  the  projection 
residual  with  H.      Then  under  the  uniform  mixing  condition  of  Assumption  9.  1 
a),  it  follows  by  Lemma  2.2  of  White  and  Domowitz  (1984)  that  for  any 
conformable  v  and  cU)    =  Ct 

(9.6)  v' [C.,+C.']v  £  ZCt'^^v'ElP.P'.u^  .]v  ^  ZcU)v    iJ)^v'Zv, 
Also,  under  Assumption  9.1  b),  for  c(i)  given  there, 

(9.7)  v' [C..+C.']v  £  2\E[v'P.P'.    ,i^E[u.  .u.  .  .Iq.,q.  ,]  ]  I  s  2v    (J)^cU)v' Zv. 

jt     jt  1  i+t  ij  i+i,  J  ^1  ^i+t  y 

2 

Thus,  Zv    (J)  c(t)S  -  (C..+C.',)   is  positive  semi-definite  in  either  case,  for 
y  Ji-    jt 

c(t)   as  in  (9.6)  or  (9.7).   Let  u.  =  (u, ,u  .)'.   Then  it  follows  that 

J     Ij      nj 

(9.8)  E[llz"-^''^P'ull^/n^]  =  r.-^^ELu'.PE'V'u.l/n^ 

s  Ji^  (J)^tr(E~-^''^[C  +  27"  c(i)]ZZ"'^^^)/n  =  0(JKi^  (J)^/n), 
y  ^t=0  y 

-1/2  —  1/2      -1/2 

giving   IIZ    P'u/nll  =  0  ((JK)    v    (J)n    ).   The  conclusion  then  follows  by 

p      y 

Lemma  9.3,  Lemma  9.7,  Assumption  3.5,  and  Assumption  3.7.    ■ 


Define 


5(h,K,d,v)  =   inf  max..  .  ,{E[|a'^{h(q.)-7r'p^(q.)}r]}^''''  +  exp(-exp(K) ) , 

TT     \  ^  \  —CI  1  1 

5(h,K,d,oo)  =   inf  max ..   sup    I  a'^[h(q.  )-7i'p'^(q)  ]  |  +  exp(-exp(K) ) . 
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Lemma  9.9:      If  Assumptions  3.1,    3.4,    3.5,    3.7,    and  9.1   are  satisfied, 

p       y         J  K-N^K   J 

K      V   1/v 
Proof:   Let  jr  .„  be  such  that   {E[  |h,(q.  )-7r  'p  (q,  )  ]  |  ]  >^   s  5(h,,K,  0,  v). 

J'^  J   1    Jr>-      1-  J 

and  let  n     =  [tt  t^  ,v^  ■      By  Assumption  3.5,   5(h.,K,0,v)  s  5(h.,K,0,v), 

s.  Ik  Jk  j  j 

for  K  s  K  s  K.   Then  for  1-   the  indicator  function  for  the  event  K  s  K  £  K, 
E[l^llh-p'Trj^ll^/n]  ^   E[max^^j,^j^llh-p'7rj,ll^/n] 

2 

Then  by     1=1     w.  p.a.  1   and  the  Markov  inequality,      llh-p'7T/>ll   /n  £ 

V  ?/v  - 

0  iY,.{Y.^^„^^(h.,K,0,w)    }        ).      Also,  by  Lemma  9.8,   (y-h)'p(p'p)  p' (y-h)/n  = 

P   J   ^— K— Is.    J 

—  1/2      -1/2 
0  ((JK)    I'  (J)n    ).   The  conclusion  then  follows  by  Lemma  9.4.     ■ 


Lemma  9.10:      If  Assumptions  3.1.    3.4,    3.5.    3.7,    and   9.1   are  satisfied,    then 
for   the  distribution     F(q)  of     q, 

{5:.J[h.(q)-h.(q)]^dF(q)}^''^ 

P  y  J    K=:1S.-1S.      J 


Proof:   Let  n.       be  as  defined  in  the  proof  of  the  previous  Lemma,  with 

K  K 

P  (q)   replacing  p  (q).   Then  by  the  same  argument  as  there, 

Xl^llh(q)-P^(q)'7r-ll^dF(q)  s  i:j{5:j^^j,^^5(hj,  K,  0,  v)''}^"'''. 

Next,  apply  Lemma  9.5  with  S  =  jP^(q)P^(q)'dF(q),   P=[P^(q^) ^^^''n^^'' 

P  =  [P^(q^), . . . ,P^(q^)]',   S   the  selection  matrix  such  that  P^(q)  =  S'P^(q) 

-V        -  -  1/2      -1/2 

w.p.a.l,   71  =  (P'P)  P'y,   71  =  ir^.   G  =  h,   e^  =  (JK)    v    (J)n    .   From  the 


48 


conclusion  of  Lemma  9.5  and  the  argument  in  the  previous  Lemma,  it  follows 
that 

Xllp'^(q)[n-7r^]ll^dF(q)  =  JllP^(q)  [ii-TT^]  ll^dF(q) 

=  tr{(n-7r^)'[JT^(q)P^(q)'dF(q)](n-Tr^)} 

=  tr[(jr-7r)'S'ZS(n-7r)]    :£  0   (e^)    +  0   (1)  IIG-P7rll^/n 

p     n  p 

=  °p^^^    *  Op(l)llh-P'7r^ll2/n  =  Op(.2)    .  0p(I.{I^^^^^5(h..K.0.v)^}2/-) 


Then  by  the  first  equation  of  this  proof,  and   1-  =  1   w.p.a.l, 

J'llh(q)-h(q)ll^dF(q)  s  C{  JllP^(q)  [rt-ir^]  ll^dF(q)  +  Jl^llh(q)-P^(q) '  Tij^ll^dFCq)  >, 


so  that  the  conclusion  follows. 


Lemma  9.11:      If  Assumptions  3.1,    3.4,    3.5,    3.7,    and  9.1   are  satisfied,      h(q) 
and  P   (q)   are  different iable  of  order      \\\      for  each     k  ^  K     and     K, 


"•^^sK^^eo'^^kK^^^I  ^<|A|^^^'   ^^'^  ^Ul^^^  ^<0^^^'   ^^^" 


sup^„lia^(q)  -  a^(q)ll  =0  (K^''^C,^|(K)[(K/n)^^^+{E.5(h..K.  Ul.co)^}^''^]). 

Proof:   Let  n.      e  R^  be  such  that,  for  A.  (q)  h  h .  (q)-P^(q) 'tt  . 
JK-  JK.        J  JK. 

sup  „|A.„(q)|  :£  5(h.,K,  |A|,oo)   and  sup  „  |  a\  .^(q)  I  :£  5(h  ..  K,  |X  |  ,  oo)   for 
qtU  JJ>>-  J  qsy    js.  J 

each  J  and  K.   Also,  let  tt   be  the  K  x  J  matrix  with  J    column  n. 

K.  jK. 

and  A^  the  n  x  J  matrix  with  ij    element  A.„(q.).   Note  that  by 
Assumption  3.7,   5(h  .,  K,  |  A| ,  oo)  ^  5(h  .,  K,  |  A| ,  oo)   w.p.a.l.   Thus,  for  P  = 

[P*^(q^),...,P^(q^)]',   w.p.a.l. 

(9.9)     llh-Pn^ll^/n  =  IIA^II^/n  =  J] -E- A -i^Cq- )^/n  i  j;  .5(h  .,  K,  |  A  | .  oo)^. 
K.  K.         J  1  jK.   1         J    J 
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Next,  let  Y  =  y  and  G  =  h.   Note  that  columns  of  P  are  a  nonsingular 
linear  transformation  of  the  columns  of  p,   so  that  P(P'P)  P'  =  p(p'p)  p' . 
Thus,  by  Lemma  9.8,   tr [ (Y-G)'P(P'P)~P' (Y-G) ]/n  =  0  (K/n).   Thus,  by  Lemma 
9.5. 

(9.10)  llTC-Tri>ll^  s  0    (K/n)    +0    (l)y'.5(h  ..K,  U| ,  oo)^ 

K  p  P  J        J  ~ 

=  Op([(K/n)^''^  +   {j:j5(hj.K,  lAl.o.)^}^^^]^). 

Noting  that  3  h(q)  =  n' d  P   (q),   it  then  follows  by  the  Cauchy-Schwartz 
inequality  that  for  any  q  e  Q, 

lia'^h(q)  -  a\(q)ll^  s  C{ll(^-Tr/>)'a'^P^(q)ll^  +  ll7r-'aV^(q)  -  a\(q)ll^> 

£  ll7r-7r-ll^lia'^P^(q)ll^  +  j:.A.-(q)^  ^   KC , .  ,  (K)^llii-7r"  11^  +  J]  .a(h  .,  K,  |  A  t ,  oo). 

r>-  J  JN-  I  A  I  f^.  J     J 


Since  the  tefrm  following  the  last  inequality  does  not  depend  on  n,   the  first 
conclusion  then  follows  from  eq.  (9.12).    ■ 
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